Query years of decoded
GA4 history from your
laptop. No warehouse.

DuckDB reads Parquet directly. Decode GA4 writes Parquet directly. Point one at the other and you get sub-second analytical queries against your full GA4 history without standing up a warehouse, without paying BigQuery scan tax, and without writing a single line of UNNEST. The DuckDB UI runs in your browser at localhost:4213.

Connection: GCS Parquet Engine: DuckDB v1.2+ Template: events_external Cost: free, local-first

Most ad-hoc GA4 analysis does not need a warehouse. It needs a fast SQL engine pointed at clean files. DuckDB is that engine. Decode GA4 produces those files.

What ad-hoc analysis on raw GA4 actually looks like

You have a question. Something like — was the conversion drop in March a real signal or a tracking artefact. To answer it on the raw GA4 export you open BigQuery, write a query that scans a year of partitions, watch the bytes-billed counter climb, and pay for the privilege of UNNEST-ing the same event_params record you have unnested a hundred times before. Every iteration of the question costs money. Every analyst exploring the same data pays for the same scan.

Why DuckDB changes the economics

DuckDB runs locally. It reads Parquet directly from GCS using HMAC credentials. There is no warehouse to provision, no egress to pay for if you are inside GCP, and no per-query scan cost. Once Decode GA4 has written your events out as Parquet, every subsequent query — and every iteration of every query — is free. The only cost is your laptop's CPU.

Why this is a Decode GA4 story, not a DuckDB story

DuckDB on its own cannot make sense of the raw GA4 export — the schema is nested, repeated, and hostile. You would still be writing the same UNNEST scaffolding, just locally. Decode GA4 produces a flat Parquet dataset where event_params is already a single struct of named columns. DuckDB then becomes what it should be: a fast query engine on top of a clean source. Open the DuckDB UI, attach the bucket, ask the question.

The traditional approach

Option A

Iterate inside BigQuery

Write the question as a BigQuery SQL job. UNNEST event_params, scan the partitions, pay the scan cost. Refine the query. Pay the scan cost again. A morning of exploration on a year of GA4 data routinely runs into double-digit dollars before anyone has produced an actual answer.

Pays per question, per iteration

Option B

Export to CSV and open it in Pandas

Run a one-off BigQuery query, dump the result to CSV or Parquet, load it into a notebook. Works for one slice, but you have just frozen the dataset at a point in time. Any follow-up question that needs a column you did not include forces another export. You are now maintaining a stale local copy of GA4.

Stale data, no schema evolution

Option C

Stand up a small warehouse for ad-hoc work

Spin up a Snowflake or Redshift instance for analyst exploration. You now have monthly minimums, an additional vendor, IAM to manage, and a second copy of the GA4 data to keep in sync. All for a workload that was supposed to be lightweight and disposable.

Warehouse overhead for ad-hoc work

Decode GA4 + DuckDB vs the alternatives

Feature	Decode GA4 + DuckDB	Raw GA4 in BigQuery
Cost per query iteration	Zero — local CPU only	Bytes-scanned billing each time
UNNEST required for event params	No — direct columns	Yes — every parameter
Years-of-history scan time	Sub-second on partitioned Parquet	Seconds, billed by bytes
Setup time	Five minutes — install, HMAC, query	Already there, but every query is paid
Workspace	DuckDB UI in browser at localhost:4213	BigQuery console with cost meter
Schema drift	Handled by Decode upstream	Silent at the SQL layer

Decode writes Parquet to GCS. DuckDB reads Parquet from GCS. Four steps and you are querying.

[ 1 ]

Subscribe via Google Cloud Marketplace

Decode GA4 is a Marketplace listing. Usage-based pricing, no monthly minimum. The subscription completes in under a minute and billing appears on your existing GCP invoice.
[ 2 ]

Deploy with the events_external template

The events_external template writes a flat Parquet dataset to a GCS bucket of your choice, partitioned by date. No BigQuery destination is required for this workflow — DuckDB will read the Parquet files directly from GCS.
[ 3 ]

Generate a GCS HMAC key

In the Cloud Console, go to Cloud Storage → Settings → Interoperability and create an HMAC key for the service account that has read access to the Decode bucket. DuckDB uses the S3-compatible HMAC interface to authenticate against GCS.
[ 4 ]

Open DuckDB and query the bucket

Run duckdb -ui to open the browser interface at localhost:4213. Register the HMAC key as a persistent secret with CREATE SECRET, then query the Parquet files using a standard FROM 'gcs://...' clause. The decoded events are immediately available with event parameters as direct columns.

HOW THE SETUP WORKS

Wire decoded GA4 into DuckDB in four small steps. Once the secret is registered, every future query is just SQL against a Parquet path — no warehouse, no orchestration.

GCP

Run the Decode GA4 installer with the events_external template pointing at your GCS bucket.

GCP

Create an HMAC key under Cloud Storage → Settings → Interoperability for a service account with read access to the bucket.

Local

Install DuckDB v1.2+ via brew, pip, or winget. Launch the UI with duckdb -ui.

DuckDB

DuckDB queries the Parquet files in place. No data is copied locally, no intermediate warehouse table is created, and the credentials never leave your machine. The Decode GCS bucket lives in your project and stays under your IAM controls.

Full walkthrough with commands →

What you get with DuckDB

Sub-second queries on years of GA4

DuckDB's columnar engine scans partitioned Parquet at near-disk speed. A page_view aggregation across two years of history typically returns in under a second on commodity hardware.

Zero query cost after setup

BigQuery scan billing disappears the moment you move ad-hoc work to DuckDB. You pay Decode for the daily decode and GCS for storage — and nothing per query, however many times you iterate.

The DuckDB UI in your browser

Run duckdb -ui and a notebook-style workspace opens at localhost:4213. SQL cells, schema browser, query history. No Cloud Console, no editor switching, no cost meter.

Direct event parameter columns

page_location, page_referrer, ga_session_id, geo.country — every standard parameter is a direct column on the decoded events table. No correlated subqueries, no UNNEST, no scaffolding.

Works inside Python and notebooks

The same Parquet path opens cleanly from duckdb.connect() in a Jupyter notebook. Build attribution models, run cohort analysis, train ML features — all from the same data the UI reads.

Schema evolution stays upstream

When GA4 adds a new event parameter, the next decode run picks it up and the new column appears in the next Parquet file. DuckDB reads the updated schema on the next query — no migrations to write.

One-off ad-hoc queries on multi-year history

The classic case — a product manager asks why conversions dipped in a particular week last year. Pull up DuckDB, point at the bucket, write the query, get the answer. No BigQuery job, no scan bill, no waiting for a data team to schedule the work.

Cheap year-over-year analysis

YoY comparisons on raw GA4 require scanning two full years of partitions every time. Against decoded Parquet in DuckDB, the same query is sub-second and free. Run it daily, run it hourly — the cost does not change.

Notebook-based attribution modelling

Open a Jupyter notebook, connect DuckDB to the Decode bucket, and pull session-level data straight into a dataframe. Build first-touch, last-touch, or Markov attribution without orchestrating a separate ETL into the modelling environment.

Do I need a BigQuery destination at all?

No. The events_external template writes Parquet directly to GCS. If your only consumer is DuckDB, you can skip a BigQuery destination entirely and pay only for storage and the decode itself. See setup →

Why HMAC and not a service account JSON?

DuckDB authenticates to GCS through the S3-compatible interoperability layer, which uses HMAC keys. It is the supported path for object storage access in DuckDB and avoids shipping JSON service account credentials to local machines.

How big a dataset will DuckDB handle locally?

It depends on your laptop. DuckDB streams Parquet and is comfortable with multi-billion-row aggregations on 16GB of RAM, especially when the query touches a few columns. For datasets that exceed local memory or need shared access, MotherDuck runs the same DuckDB engine in the cloud. See MotherDuck →

Can I still use BigQuery for production models?

Yes — the events_external template can write to both GCS and a BigQuery external table at the same time. Production dbt or scheduled jobs continue to run in BigQuery while ad-hoc analyst work shifts to DuckDB. One source of decoded data, two consumers.

Deploy in under 5 minutes

A query engine,
not a warehouse bill.

Subscribe via Google Cloud Marketplace, run the events_external template against a GCS bucket, and have decoded GA4 ready to query in DuckDB before your next coffee.

Get Started on Marketplace → Read the documentation

Google Cloud Marketplace · Usage-based · No monthly minimum

Query years of decoded
GA4 history from your
laptop. No warehouse.

WHY THIS
MATTERS

What ad-hoc analysis on raw GA4 actually looks like

Why DuckDB changes the economics

Why this is a Decode GA4 story, not a DuckDB story

The traditional approach

Iterate inside BigQuery

Export to CSV and open it in Pandas

Stand up a small warehouse for ad-hoc work

Decode GA4 + DuckDB vs the alternatives

HOW
DECODE
GA4
WORKS

Subscribe via Google Cloud Marketplace

Deploy with the events_external template

Generate a GCS HMAC key

Open DuckDB and query the bucket

HOW THE SETUP WORKS

What you get with DuckDB

Sub-second queries on years of GA4

Zero query cost after setup

The DuckDB UI in your browser

Direct event parameter columns

Works inside Python and notebooks

Schema evolution stays upstream

COMMON
USE CASES

One-off ad-hoc queries on multi-year history

Cheap year-over-year analysis

Notebook-based attribution modelling

COMMON
QUESTIONS

Do I need a BigQuery destination at all?

Why HMAC and not a service account JSON?

How big a dataset will DuckDB handle locally?

Can I still use BigQuery for production models?

A query engine,
not a warehouse bill.

Query years of decodedGA4 history from yourlaptop. No warehouse.

WHY THISMATTERS

What ad-hoc analysis on raw GA4 actually looks like

Why DuckDB changes the economics

Why this is a Decode GA4 story, not a DuckDB story

The traditional approach

Iterate inside BigQuery

Export to CSV and open it in Pandas

Stand up a small warehouse for ad-hoc work

Decode GA4 + DuckDB vs the alternatives

HOWDECODEGA4WORKS

Subscribe via Google Cloud Marketplace

Deploy with the events_external template

Generate a GCS HMAC key

Open DuckDB and query the bucket

HOW THE SETUP WORKS

What you get with DuckDB

Sub-second queries on years of GA4

Zero query cost after setup

The DuckDB UI in your browser

Direct event parameter columns

Works inside Python and notebooks

Schema evolution stays upstream

COMMONUSE CASES

One-off ad-hoc queries on multi-year history

Cheap year-over-year analysis

Notebook-based attribution modelling

COMMONQUESTIONS

Do I need a BigQuery destination at all?

Why HMAC and not a service account JSON?

How big a dataset will DuckDB handle locally?

Can I still use BigQuery for production models?

Other analytics integrations

GA4 BigQuery → MotherDuck

GA4 BigQuery → GCS

GA4 BigQuery → Rill

A query engine,not a warehouse bill.

Query years of decoded
GA4 history from your
laptop. No warehouse.

WHY THIS
MATTERS

HOW
DECODE
GA4
WORKS

COMMON
USE CASES

COMMON
QUESTIONS

A query engine,
not a warehouse bill.