PRIVACY & DATA

Privacy & Benchmarking Explainer

How GigAnalytics protects your income data, what the optional benchmark feature collects, and the technical safeguards (k-anonymity, differential privacy) that make it safe to participate.

Version 1.1 · June 20258 min read

1. Privacy-First Architecture

GigAnalytics is built on a data minimization principle: we collect only what's necessary to compute your ROI dashboard, and we store it in a user-partitioned database where your data is isolated by Row Level Security (RLS) from every other user's data.

🔒

Encrypted at rest

All data stored in Supabase (Postgres) with AES-256 encryption. TLS 1.3 in transit.

🧱

Row Level Security

Postgres RLS ensures every query is automatically scoped to your user_id. No query can return another user's data.

🚫

No data selling

We never sell, rent, or license your income data to third parties. The benchmark feature is opt-in and anonymized.

Free users: zero benchmark participation

Free tier users never contribute to or appear in any benchmark dataset. Benchmark contribution is an opt-in Pro feature only. Free users can still view benchmark statistics generated from Pro user contributions.

2. What Data GigAnalytics Stores

Here is a complete inventory of data stored in your GigAnalytics account:

Data typeWhat we storeWhat we don't storeWhy
Transactionsdate, net_amount, fee_amount, stream_idPayer name, description, card detailsROI calculation needs amounts + dates only
Time entriesstart/stop time, duration, stream_idLocation, device, IP at logging timeHeatmap needs time + stream only
Income streamsname, color, platform tagPlatform account IDs or credentialsDisplay + grouping only
Acquisition costsamount, date, stream_id, channelAd account IDs, campaign detailsROI formula needs cost amount only
Authemail (hashed for internal ID), password hashPlain-text password, security questionsStandard auth best practices
Usage analyticsfeature events (anonymized)PII, screen recordings, keystrokesProduct improvement only

3. Benchmark Feature: Full Data Flow

The benchmark feature allows Pro users to opt in to contributing anonymized aggregate metrics that power the "how do I compare?" insights in the dashboard. Here is the complete data flow, step by step:

  1. 1

    You enable "Contribute to benchmarks" in Settings → Privacy

    This toggle is OFF by default. Enabling it starts the contribution pipeline for your account. You can disable it at any time.

  2. 2

    Our backend computes aggregate metrics from your raw data

    The pipeline runs nightly. It reads your transactions and time entries and computes: hourly rate percentile bucket, revenue range bucket, experience range bucket, platform category, and region (country-level). It does NOT read transaction descriptions, client names, or exact amounts.

  3. 3

    Buckets are assigned (not raw values)

    Raw hourly rate of $87/hr is bucketed to "$80–$100/hr". Revenue of $4,200/mo is bucketed to "$4,000–$5,000/mo". This prevents precise individual inference.

  4. 4

    K-anonymity check

    Before your bucket is included in any aggregate, the pipeline checks: does this (platform, rate_bucket, region) combination have ≥ 25 other contributors? If not, your data is suppressed for that segment until the pool grows.

  5. 5

    Differential privacy noise is added

    Even for qualifying buckets, we apply Laplace noise (ε = 0.5) to the aggregate counts before storing them. This means the published percentile is statistically indistinguishable from what it would be if any single user were removed from the pool.

  6. 6

    Contribution is decoupled from user ID

    The final aggregated values in the benchmark store are not linked to your user_id. The pipeline uses a one-way hash of (user_id + contribution_date) as a deduplication key that cannot be reversed.

  7. 7

    Benchmark values are served to Pro users

    The published p25/median/p75 rates are served to all Pro users querying their platform + region segment. Your personal data never appears — only the anonymized aggregate.

4. K-Anonymity: The Math

K-anonymity is a formal privacy guarantee. A dataset satisfies k-anonymity if, for every record, at least k−1 other records share the same quasi-identifying attributes.

In GigAnalytics's benchmark dataset, the quasi-identifying attributes are:

  • platform_category — bucketed (design, development, writing, consulting, other)
  • hourly_rate_bucket — $10 increments up to $200, then $50 increments
  • region — country-level (US, UK, CA, AU, DE, other)
  • experience_range — <1yr, 1–3yr, 3–7yr, 7yr+
k-anonymity guarantee: k = 25 For a record r with attributes (platform, rate_bucket, region, experience): count = |{u : user_platform[u] = platform AND user_rate_bucket[u] = rate_bucket AND user_region[u] = region AND user_experience[u] = experience}| if count < 25: suppress(r) # do not include in published benchmark else: include(r) # safe to publish aggregate for this segment

What this means in practice: If you're the only freelance copywriter in New Zealand earning $90–$100/hr with 3–7 years of experience, your data won't appear in the benchmark for that segment. You'll see "insufficient data" for that specific combination.

Why k=25 and not k=5? The GDPR Article 29 Working Party recommends k≥5 for publication. We use k=25 to provide a significantly stronger guarantee, particularly important given the sensitive nature of income data. The tradeoff is reduced benchmark coverage for niche specializations.

5. Differential Privacy: The Math

K-anonymity alone is vulnerable to attacks where an adversary knows about a specific individual. Differential privacy (DP) provides a stronger guarantee: the probability of any inference about an individual changes by at most e^ε whether or not that individual's data is in the dataset.

GigAnalytics applies the Laplace mechanism to aggregate counts and statistics before publication:

ε-differential privacy via Laplace mechanism: For a function f (e.g., median hourly rate of a segment): published_f = true_f + Laplace(0, Δf/ε) Where: Δf = sensitivity of f (maximum change in f when one person is added or removed from the dataset) ε = privacy budget (0.5 in GigAnalytics) Laplace(μ, b) = noise drawn from Laplace distribution For median of hourly rates: Δf ≈ max_rate_bucket_width / 2 = ~$5 ε = 0.5 Scale of noise = $5 / 0.5 = $10 So published median may differ from true median by ±~$10.

Why ε = 0.5? Lower ε = stronger privacy guarantee but noisier statistics. We chose ε = 0.5 as a balance: it satisfies "strong" DP by most academic standards (ε ≤ 1) while keeping the benchmark accuracy high enough to be useful (±$10 noise on a $60–$150 range is acceptable).

Sequential composition: Each time we query the benchmark store with a new ε, the cumulative privacy cost increases. We limit the number of aggregate queries per user per day to 50 to bound the total privacy loss.

The combination of k=25 anonymity and ε=0.5 differential privacy provides defense-in-depth: k-anonymity protects against record linkage attacks; differential privacy protects against membership inference attacks.

6. Benchmark Contribution Pipeline Architecture

Contribution scheduler

Runs nightly at 02:00 UTC. Processes all opted-in Pro users added or modified since last run.

Aggregation worker

Supabase Edge Function that reads user metrics, applies bucketing, and computes per-segment aggregate stats.

K-anonymity filter

Postgres function that counts contributors per segment and nulls out segments below k=25.

DP noise injector

Python Lambda function using NumPy's Laplace distribution to add calibrated noise to qualifying aggregates.

Benchmark store

Separate Postgres schema with no user_id columns. Tables: benchmark_segments, benchmark_percentiles, benchmark_metadata.

Delayed publish

New contributions are held in staging for 72 hours before going live. This prevents near-real-time membership inference.

Audit log

Append-only log of pipeline runs (no user data). Rotated after 90 days. Used for debugging and compliance review.

7. What You See vs. What You Share

FeatureFree users (see?)Pro contributors (see + share?)
Your own ROI dashboard✅ Full access✅ Full access
Benchmark percentiles for your segment✅ Read-only✅ Read-only + contributes to pool
Comparison widget (vs. peers)❌ Pro only✅ Full access
Rate recommendation based on benchmarks❌ Pro only✅ Full access
Your data in benchmark pool❌ Never⚙️ Only if opted in
AI insights using benchmark context❌ Pro only✅ Full access

8. Data Deletion and Opt-Out

Disable benchmark contribution

How

Settings → Privacy → Benchmark contribution → toggle off

Effect

Your data stops contributing within 24 hours. Future aggregations exclude you. Previously published aggregates remain (they're group statistics, not individual records).

Delete your GigAnalytics account

How

Settings → Account → Delete Account

Effect

All your raw data (transactions, time entries, streams) is permanently deleted from Supabase within 7 days. Your benchmark contributions are removed from future pipeline runs. The benchmark store retains no user-identifying records, so there's nothing to delete there.

GDPR / CCPA data export

How

Settings → Privacy → Export my data

Effect

Downloads a JSON file of all your raw data: transactions, time entries, streams, and account metadata. We don't export benchmark data because it's not linked to you in the benchmark store.

GDPR right to be forgotten

How

Email hello@hourlyroi.com

Effect

We'll confirm deletion of your account and raw data within 30 days per GDPR Article 17.

9. Third-Party Data Processors

ProcessorPurposeData sharedDPA/SCCs
Supabase (Postgres)DatabaseAll user data (encrypted)✅ DPA signed, SOC 2
VercelHosting / Edge FunctionsRequest metadata (no body)✅ DPA, SOC 2 Type II
StripePayment processingEmail, payment info✅ PCI DSS Level 1
PostHog (self-hosted)Product analyticsAnonymized feature events✅ Self-hosted, no PII
Plausible AnalyticsWeb analyticsPage views (no cookies)✅ GDPR-compliant, EU-hosted
AWS SES (via Supabase)Transactional emailEmail address only✅ DPA, SOC 2

10. Security Measures

🔐

Auth

Supabase Auth with bcrypt password hashing, JWT tokens with 1hr expiry, optional MFA.

🛡️

Row Level Security

Every table enforces user_id = auth.uid() via Postgres RLS. No shared table scans.

🔗

TLS everywhere

TLS 1.3 for all API calls. HSTS header with 2-year max-age. No plain-HTTP fallback.

📋

Dependency audit

npm audit runs on every PR. Critical vulnerabilities block deployment.

🔍

Input validation

All API inputs validated with Zod schemas. SQL injection mitigated by Supabase parameterized queries.

🚨

Breach notification

If a breach is detected, affected users are notified within 72 hours per GDPR Article 33.

Responsible disclosure: Found a security issue? Email hello@hourlyroi.com. We follow a 90-day disclosure timeline and offer recognition for valid reports.