PRIVACY & DATA

Privacy & Benchmarking Explainer

How GigAnalytics protects your income data, what the optional benchmark feature collects, and the technical safeguards (k-anonymity, differential privacy) that make it safe to participate.

Version 1.1 · June 20258 min read

1.Privacy-first architecture
2.What data GigAnalytics stores
3.Benchmark feature: full data flow
4.K-anonymity: the math
5.Differential privacy: the math
6.Benchmark contribution pipeline
7.What you see vs. what you share
8.Data deletion and opt-out
9.Third-party data processors
10.Security measures

1. Privacy-First Architecture

GigAnalytics is built on a data minimization principle: we collect only what's necessary to compute your ROI dashboard, and we store it in a user-partitioned database where your data is isolated by Row Level Security (RLS) from every other user's data.

🔒

Encrypted at rest

All data stored in Supabase (Postgres) with AES-256 encryption. TLS 1.3 in transit.

🧱

Row Level Security

Postgres RLS ensures every query is automatically scoped to your user_id. No query can return another user's data.

🚫

No data selling

We never sell, rent, or license your income data to third parties. The benchmark feature is opt-in and anonymized.

Free users: zero benchmark participation

Free tier users never contribute to or appear in any benchmark dataset. Benchmark contribution is an opt-in Pro feature only. Free users can still view benchmark statistics generated from Pro user contributions.

2. What Data GigAnalytics Stores

Here is a complete inventory of data stored in your GigAnalytics account:

Data typeWhat we storeWhat we don't storeWhy

Transactionsdate, net_amount, fee_amount, stream_idPayer name, description, card detailsROI calculation needs amounts + dates only

Time entriesstart/stop time, duration, stream_idLocation, device, IP at logging timeHeatmap needs time + stream only

Income streamsname, color, platform tagPlatform account IDs or credentialsDisplay + grouping only

Acquisition costsamount, date, stream_id, channelAd account IDs, campaign detailsROI formula needs cost amount only

Authemail (hashed for internal ID), password hashPlain-text password, security questionsStandard auth best practices

Usage analyticsfeature events (anonymized)PII, screen recordings, keystrokesProduct improvement only

3. Benchmark Feature: Full Data Flow

The benchmark feature allows Pro users to opt in to contributing anonymized aggregate metrics that power the "how do I compare?" insights in the dashboard. Here is the complete data flow, step by step:

1
You enable "Contribute to benchmarks" in Settings → Privacy
This toggle is OFF by default. Enabling it starts the contribution pipeline for your account. You can disable it at any time.
2
Our backend computes aggregate metrics from your raw data
The pipeline runs nightly. It reads your transactions and time entries and computes: hourly rate percentile bucket, revenue range bucket, experience range bucket, platform category, and region (country-level). It does NOT read transaction descriptions, client names, or exact amounts.
3
Buckets are assigned (not raw values)
Raw hourly rate of $87/hr is bucketed to "$80–$100/hr". Revenue of $4,200/mo is bucketed to "$4,000–$5,000/mo". This prevents precise individual inference.
4
K-anonymity check
Before your bucket is included in any aggregate, the pipeline checks: does this (platform, rate_bucket, region) combination have ≥ 25 other contributors? If not, your data is suppressed for that segment until the pool grows.
5
Differential privacy noise is added
Even for qualifying buckets, we apply Laplace noise (ε = 0.5) to the aggregate counts before storing them. This means the published percentile is statistically indistinguishable from what it would be if any single user were removed from the pool.
6
Contribution is decoupled from user ID
The final aggregated values in the benchmark store are not linked to your user_id. The pipeline uses a one-way hash of (user_id + contribution_date) as a deduplication key that cannot be reversed.
7
Benchmark values are served to Pro users
The published p25/median/p75 rates are served to all Pro users querying their platform + region segment. Your personal data never appears — only the anonymized aggregate.

4. K-Anonymity: The Math

K-anonymity is a formal privacy guarantee. A dataset satisfies k-anonymity if, for every record, at least k−1 other records share the same quasi-identifying attributes.

In GigAnalytics's benchmark dataset, the quasi-identifying attributes are:

platform_category — bucketed (design, development, writing, consulting, other)
hourly_rate_bucket — $10 increments up to $200, then $50 increments
region — country-level (US, UK, CA, AU, DE, other)
experience_range — <1yr, 1–3yr, 3–7yr, 7yr+

k-anonymity guarantee: k = 25 For a record r with attributes (platform, rate_bucket, region, experience): count = |{u : user_platform[u] = platform AND user_rate_bucket[u] = rate_bucket AND user_region[u] = region AND user_experience[u] = experience}| if count < 25: suppress(r) # do not include in published benchmark else: include(r) # safe to publish aggregate for this segment

What this means in practice: If you're the only freelance copywriter in New Zealand earning $90–$100/hr with 3–7 years of experience, your data won't appear in the benchmark for that segment. You'll see "insufficient data" for that specific combination.

Why k=25 and not k=5? The GDPR Article 29 Working Party recommends k≥5 for publication. We use k=25 to provide a significantly stronger guarantee, particularly important given the sensitive nature of income data. The tradeoff is reduced benchmark coverage for niche specializations.

5. Differential Privacy: The Math

K-anonymity alone is vulnerable to attacks where an adversary knows about a specific individual. Differential privacy (DP) provides a stronger guarantee: the probability of any inference about an individual changes by at most e^ε whether or not that individual's data is in the dataset.

GigAnalytics applies the Laplace mechanism to aggregate counts and statistics before publication:

ε-differential privacy via Laplace mechanism: For a function f (e.g., median hourly rate of a segment): published_f = true_f + Laplace(0, Δf/ε) Where: Δf = sensitivity of f (maximum change in f when one person is added or removed from the dataset) ε = privacy budget (0.5 in GigAnalytics) Laplace(μ, b) = noise drawn from Laplace distribution For median of hourly rates: Δf ≈ max_rate_bucket_width / 2 = ~$5 ε = 0.5 Scale of noise = $5 / 0.5 = $10 So published median may differ from true median by ±~$10.

Why ε = 0.5? Lower ε = stronger privacy guarantee but noisier statistics. We chose ε = 0.5 as a balance: it satisfies "strong" DP by most academic standards (ε ≤ 1) while keeping the benchmark accuracy high enough to be useful (±$10 noise on a $60–$150 range is acceptable).

Sequential composition: Each time we query the benchmark store with a new ε, the cumulative privacy cost increases. We limit the number of aggregate queries per user per day to 50 to bound the total privacy loss.

The combination of k=25 anonymity and ε=0.5 differential privacy provides defense-in-depth: k-anonymity protects against record linkage attacks; differential privacy protects against membership inference attacks.

6. Benchmark Contribution Pipeline Architecture

Contribution scheduler

Runs nightly at 02:00 UTC. Processes all opted-in Pro users added or modified since last run.

Aggregation worker

Supabase Edge Function that reads user metrics, applies bucketing, and computes per-segment aggregate stats.

K-anonymity filter

Postgres function that counts contributors per segment and nulls out segments below k=25.

DP noise injector

Python Lambda function using NumPy's Laplace distribution to add calibrated noise to qualifying aggregates.

Benchmark store

Separate Postgres schema with no user_id columns. Tables: benchmark_segments, benchmark_percentiles, benchmark_metadata.

Delayed publish

New contributions are held in staging for 72 hours before going live. This prevents near-real-time membership inference.

Audit log

VercelHosting / Edge FunctionsRequest metadata (no body)✅ DPA, SOC 2 Type II

StripePayment processingEmail, payment info✅ PCI DSS Level 1

PostHog (self-hosted)Product analyticsAnonymized feature events✅ Self-hosted, no PII

Plausible AnalyticsWeb analyticsPage views (no cookies)✅ GDPR-compliant, EU-hosted

AWS SES (via Supabase)Transactional emailEmail address only✅ DPA, SOC 2

10. Security Measures

🔐

Auth

Supabase Auth with bcrypt password hashing, JWT tokens with 1hr expiry, optional MFA.

🛡️

Row Level Security

Every table enforces user_id = auth.uid() via Postgres RLS. No shared table scans.

🔗

TLS everywhere

TLS 1.3 for all API calls. HSTS header with 2-year max-age. No plain-HTTP fallback.

📋

Dependency audit

npm audit runs on every PR. Critical vulnerabilities block deployment.

🔍

Input validation

All API inputs validated with Zod schemas. SQL injection mitigated by Supabase parameterized queries.

🚨

Breach notification

If a breach is detected, affected users are notified within 72 hours per GDPR Article 33.

Responsible disclosure: Found a security issue? Email hello@hourlyroi.com. We follow a 90-day disclosure timeline and offer recognition for valid reports.

← All Docs ROI Whitepaper →Integration Roadmap →

Privacy & Benchmarking Explainer

Contents

1. Privacy-First Architecture

Encrypted at rest

Row Level Security

No data selling

Free users: zero benchmark participation

2. What Data GigAnalytics Stores

3. Benchmark Feature: Full Data Flow

You enable "Contribute to benchmarks" in Settings → Privacy

Our backend computes aggregate metrics from your raw data

Buckets are assigned (not raw values)

K-anonymity check

Differential privacy noise is added

Contribution is decoupled from user ID

Benchmark values are served to Pro users

4. K-Anonymity: The Math

5. Differential Privacy: The Math

6. Benchmark Contribution Pipeline Architecture

Contribution scheduler

Aggregation worker

K-anonymity filter

DP noise injector

Benchmark store

Delayed publish

Audit log

7. What You See vs. What You Share

8. Data Deletion and Opt-Out

Disable benchmark contribution

Delete your GigAnalytics account

GDPR / CCPA data export

GDPR right to be forgotten

9. Third-Party Data Processors

10. Security Measures

Auth

Row Level Security

TLS everywhere

Dependency audit

Input validation

Breach notification