Query years of decoded
GA4 history from your
laptop. No warehouse.
DuckDB reads Parquet directly. Decode GA4 writes Parquet directly. Point one at the other and you get sub-second analytical queries against your full GA4 history without standing up a warehouse, without paying BigQuery scan tax, and without writing a single line of UNNEST. The DuckDB UI runs in your browser at localhost:4213.
Most ad-hoc GA4 analysis does not need a warehouse. It needs a fast SQL engine pointed at clean files. DuckDB is that engine. Decode GA4 produces those files.
What ad-hoc analysis on raw GA4 actually looks like
You have a question. Something like — was the conversion drop in March a real signal or a tracking artefact. To answer it on the raw GA4 export you open BigQuery, write a query that scans a year of partitions, watch the bytes-billed counter climb, and pay for the privilege of UNNEST-ing the same event_params record you have unnested a hundred times before. Every iteration of the question costs money. Every analyst exploring the same data pays for the same scan.
Why DuckDB changes the economics
DuckDB runs locally. It reads Parquet directly from GCS using HMAC credentials. There is no warehouse to provision, no egress to pay for if you are inside GCP, and no per-query scan cost. Once Decode GA4 has written your events out as Parquet, every subsequent query — and every iteration of every query — is free. The only cost is your laptop's CPU.
Why this is a Decode GA4 story, not a DuckDB story
DuckDB on its own cannot make sense of the raw GA4 export — the schema is nested, repeated, and hostile. You would still be writing the same UNNEST scaffolding, just locally. Decode GA4 produces a flat Parquet dataset where event_params is already a single struct of named columns. DuckDB then becomes what it should be: a fast query engine on top of a clean source. Open the DuckDB UI, attach the bucket, ask the question.
The traditional approach
Iterate inside BigQuery
Write the question as a BigQuery SQL job. UNNEST event_params, scan the partitions, pay the scan cost. Refine the query. Pay the scan cost again. A morning of exploration on a year of GA4 data routinely runs into double-digit dollars before anyone has produced an actual answer.
Export to CSV and open it in Pandas
Run a one-off BigQuery query, dump the result to CSV or Parquet, load it into a notebook. Works for one slice, but you have just frozen the dataset at a point in time. Any follow-up question that needs a column you did not include forces another export. You are now maintaining a stale local copy of GA4.
Stand up a small warehouse for ad-hoc work
Spin up a Snowflake or Redshift instance for analyst exploration. You now have monthly minimums, an additional vendor, IAM to manage, and a second copy of the GA4 data to keep in sync. All for a workload that was supposed to be lightweight and disposable.
Decode GA4 + DuckDB vs the alternatives
| Feature | Decode GA4 + DuckDB | Raw GA4 in BigQuery |
|---|---|---|
| Cost per query iteration | Zero — local CPU only | Bytes-scanned billing each time |
| UNNEST required for event params | No — direct columns | Yes — every parameter |
| Years-of-history scan time | Sub-second on partitioned Parquet | Seconds, billed by bytes |
| Setup time | Five minutes — install, HMAC, query | Already there, but every query is paid |
| Workspace | DuckDB UI in browser at localhost:4213 | BigQuery console with cost meter |
| Schema drift | Handled by Decode upstream | Silent at the SQL layer |
Decode writes Parquet to GCS. DuckDB reads Parquet from GCS. Four steps and you are querying.
- [ 1 ]
Subscribe via Google Cloud Marketplace
Decode GA4 is a Marketplace listing. Usage-based pricing, no monthly minimum. The subscription completes in under a minute and billing appears on your existing GCP invoice.
- [ 2 ]
Deploy with the events_external template
The events_external template writes a flat Parquet dataset to a GCS bucket of your choice, partitioned by date. No BigQuery destination is required for this workflow — DuckDB will read the Parquet files directly from GCS.
- [ 3 ]
Generate a GCS HMAC key
In the Cloud Console, go to Cloud Storage → Settings → Interoperability and create an HMAC key for the service account that has read access to the Decode bucket. DuckDB uses the S3-compatible HMAC interface to authenticate against GCS.
- [ 4 ]
Open DuckDB and query the bucket
Run
duckdb -uito open the browser interface at localhost:4213. Register the HMAC key as a persistent secret with CREATE SECRET, then query the Parquet files using a standard FROM 'gcs://...' clause. The decoded events are immediately available with event parameters as direct columns.
HOW THE SETUP WORKS
Wire decoded GA4 into DuckDB in four small steps. Once the secret is registered, every future query is just SQL against a Parquet path — no warehouse, no orchestration.
GCP
Run the Decode GA4 installer with the events_external template pointing at your GCS bucket.
GCP
Create an HMAC key under Cloud Storage → Settings → Interoperability for a service account with read access to the bucket.
Local
Install DuckDB v1.2+ via brew, pip, or winget. Launch the UI with duckdb -ui.
DuckDB
Register the HMAC key with CREATE PERSISTENT SECRET. Query 'gcs://YOUR_BUCKET/.../**/*.parquet' directly.
DuckDB queries the Parquet files in place. No data is copied locally, no intermediate warehouse table is created, and the credentials never leave your machine. The Decode GCS bucket lives in your project and stays under your IAM controls.
What you get with DuckDB
Sub-second queries on years of GA4
DuckDB's columnar engine scans partitioned Parquet at near-disk speed. A page_view aggregation across two years of history typically returns in under a second on commodity hardware.
Zero query cost after setup
BigQuery scan billing disappears the moment you move ad-hoc work to DuckDB. You pay Decode for the daily decode and GCS for storage — and nothing per query, however many times you iterate.
The DuckDB UI in your browser
Run duckdb -ui and a notebook-style workspace opens at localhost:4213. SQL cells, schema browser, query history. No Cloud Console, no editor switching, no cost meter.
Direct event parameter columns
page_location, page_referrer, ga_session_id, geo.country — every standard parameter is a direct column on the decoded events table. No correlated subqueries, no UNNEST, no scaffolding.
Works inside Python and notebooks
The same Parquet path opens cleanly from duckdb.connect() in a Jupyter notebook. Build attribution models, run cohort analysis, train ML features — all from the same data the UI reads.
Schema evolution stays upstream
When GA4 adds a new event parameter, the next decode run picks it up and the new column appears in the next Parquet file. DuckDB reads the updated schema on the next query — no migrations to write.
One-off ad-hoc queries on multi-year history
The classic case — a product manager asks why conversions dipped in a particular week last year. Pull up DuckDB, point at the bucket, write the query, get the answer. No BigQuery job, no scan bill, no waiting for a data team to schedule the work.
Cheap year-over-year analysis
YoY comparisons on raw GA4 require scanning two full years of partitions every time. Against decoded Parquet in DuckDB, the same query is sub-second and free. Run it daily, run it hourly — the cost does not change.
Notebook-based attribution modelling
Open a Jupyter notebook, connect DuckDB to the Decode bucket, and pull session-level data straight into a dataframe. Build first-touch, last-touch, or Markov attribution without orchestrating a separate ETL into the modelling environment.
Do I need a BigQuery destination at all?
No. The events_external template writes Parquet directly to GCS. If your only consumer is DuckDB, you can skip a BigQuery destination entirely and pay only for storage and the decode itself. See setup →
Why HMAC and not a service account JSON?
DuckDB authenticates to GCS through the S3-compatible interoperability layer, which uses HMAC keys. It is the supported path for object storage access in DuckDB and avoids shipping JSON service account credentials to local machines.
How big a dataset will DuckDB handle locally?
It depends on your laptop. DuckDB streams Parquet and is comfortable with multi-billion-row aggregations on 16GB of RAM, especially when the query touches a few columns. For datasets that exceed local memory or need shared access, MotherDuck runs the same DuckDB engine in the cloud. See MotherDuck →
Can I still use BigQuery for production models?
Yes — the events_external template can write to both GCS and a BigQuery external table at the same time. Production dbt or scheduled jobs continue to run in BigQuery while ad-hoc analyst work shifts to DuckDB. One source of decoded data, two consumers.
Deploy in under 5 minutes
A query engine,
not a warehouse bill.
Subscribe via Google Cloud Marketplace, run the events_external template against a GCS bucket, and have decoded GA4 ready to query in DuckDB before your next coffee.
Google Cloud Marketplace · Usage-based · No monthly minimum