Decoded GA4 as a clean
dbt source — no UNNEST
in your staging layer.
The GA4 BigQuery export was not designed to be modelled with dbt. Most teams write the same staging model twice: once to flatten the nested events, once to reshape what they actually want. Decode GA4 collapses that into one source declaration. Every event parameter is already a direct column.
A dbt project that sources GA4 directly from the BigQuery export is mostly UNNEST scaffolding. Decode GA4 removes the scaffolding.
The shape of the problem
The raw GA4 export stores every event parameter inside a repeated record. To get a single page_location into a dbt model, you write a correlated subquery against event_params. To get the device category, another. To get the user pseudo id, another. Most teams end up with a stg_ga4__events.sql file that is several hundred lines of unnesting logic before any business modelling has happened.
Why this hurts dbt projects specifically
Two reasons. First, every downstream model now depends on that staging file, and any change to it triggers a full refresh of everything that references it. Second, when GA4 silently adds a new event parameter, the staging model does not break. It just quietly ignores the new column until someone notices and updates the SQL. The lineage looks healthy. The data is incomplete.
What changes with Decode GA4
Decode writes a flat events table into a BigQuery dataset of your choice. You declare it as a dbt source. Your staging models reference event parameters as direct columns: event_param.page_location, event_param.page_referrer, geo.country. There is no UNNEST in your project, anywhere. When GA4 adds a new parameter, it appears in the next decode run and your source schema picks it up automatically.
The traditional approach
UNNEST inside the dbt staging layer
Write stg_ga4__events.sql with one CROSS JOIN UNNEST per parameter you care about. Refresh costs scale with the number of parameters, not the number of rows. Every new parameter requires a code change. Every analyst who touches the project has to learn the unnesting pattern before they can do anything useful.
Use a community package
Adopt one of the public dbt packages for GA4. They flatten more events than you need, materialise a lot of intermediate tables, and tend to lag behind GA4 schema changes. You inherit somebody else's modelling decisions and somebody else's release cadence.
Run a separate ETL before dbt
Stand up Airbyte or a Cloud Function to flatten GA4 and write back to BigQuery, then point dbt at that. Now you maintain two pipelines, two failure modes, and two places where the schema can drift away from GA4. The dbt graph looks clean. The stack underneath is not.
Decode GA4 vs DIY UNNEST staging
| Feature | Decode GA4 source | Hand-built dbt staging |
|---|---|---|
| Lines of SQL to flatten a page_view | 3 | ~18 per parameter |
| New GA4 parameter handling | Auto-detected, appears in source | Manual SQL update required |
| Refresh cost | External table, no scan on dbt build | Full UNNEST scan on every refresh |
| Onboarding time for analysts | Treat events like any other source | Must learn UNNEST patterns first |
| Schema drift risk | Handled upstream | Silent — staging drops new params |
| Maintenance over a year | Zero | Recurring, every GA4 release |
One install. A clean source. dbt handles the rest.
- [ 1 ]
Subscribe via Google Cloud Marketplace
Decode GA4 is a Marketplace listing. Usage-based pricing, no monthly minimum. The subscription takes under a minute and billing appears on your existing GCP invoice.
- [ 2 ]
Deploy with the events_external template
Pick the BigQuery dataset that your dbt project already uses as a source — for example dbt_sources. Set destination_dataset_id to that dataset. Decode writes a Parquet-backed external table called events into it.
- [ 3 ]
Declare the source in dbt
Add a single sources.yml entry pointing at decode_ga4.events. Reference it in models with the standard source() macro. There is no staging UNNEST file to maintain — event parameters are already direct columns.
- [ 4 ]
Run dbt as normal
dbt build picks the source up like any other table. Lineage works. Tests work. Documentation works. The decoded events table updates daily through Decode GA4, and your dbt models pick up the new partitions on their next run.
HOW THE SETUP WORKS
Wire decoded GA4 into a dbt project in four small steps. Nothing here is dbt-specific magic — it is the same source pattern you already use for every other warehouse table.
GCP
Run the Decode GA4 installer with destination_dataset_id pointing at your dbt sources dataset.
GCP
Grant the dbt service account Storage Object Viewer on the Decode GCS bucket.
dbt
Add a sources.yml entry for decode_ga4.events.
dbt
Reference {{ source('decode_ga4', 'events') }} in your models. Run dbt build.
The events table is a BigQuery external table backed by Parquet files in GCS. dbt reads it natively through the BigQuery adapter, but the underlying storage stays in your project. No data leaves your perimeter.
What you get in dbt
A first-class dbt source
The events table is referenced through {{ source() }} like any other warehouse table. Lineage, freshness checks, and source tests all work without modification.
Direct event parameter columns
page_location, page_referrer, page_title, ga_session_id, ga_session_number — every standard parameter is a direct column. No correlated subqueries inside your models.
External table economics
The source reads from Parquet files in GCS. There is no duplicate storage in BigQuery, and dbt builds against decoded data without paying to scan the raw GA4 export.
Schema evolution that just works
When GA4 adds a new event parameter, the next decode run picks it up. dbt source tests will flag any change you care about, but the build does not break.
Cleaner DAGs, fewer staging models
The stg_ga4__events file shrinks from hundreds of lines of UNNEST to a thin renaming layer — or disappears entirely. Downstream models become readable.
Works with dbt Core or dbt Cloud
The setup is identical: install dbt-bigquery, authenticate to your GCP project, declare the source. No Cloud-only features required.
Marketing attribution models
Build a session_facts model on top of the decoded events table without writing a single UNNEST. Source-medium, campaign, and landing page are direct columns. The path from raw event to attribution mart shrinks from five staging models to one.
Funnel and conversion analysis
Standard ecommerce funnels — view_item, add_to_cart, begin_checkout, purchase — become readable case statements over event_name. The downstream funnel model is a window function over a clean source, not a chain of subqueries.
Product analytics marts
Custom event parameters that your product team adds — feature flags, plan tier, A/B variant — show up as direct columns the moment they start firing. dbt models that consume them do not need to be re-shaped to accommodate the new fields.
Does this work with dbt Core or dbt Cloud?
Both. The integration is at the warehouse layer — the dbt-bigquery adapter sees the decoded events table the same way regardless of how dbt itself runs. No Cloud-only features are required. See setup →
Do I need to delete my existing GA4 staging models?
Not immediately. You can run the decoded events source alongside an existing staging model and migrate downstream models one at a time. Most teams discover that the staging file shrinks to a thin renaming layer once Decode is in place.
What permissions does my dbt service account need?
The standard BigQuery Data Viewer and BigQuery Job User roles, plus Storage Object Viewer on the Decode GA4 GCS bucket — required because the events table is an external table backed by Parquet files. Full prerequisites →
Can I still write my own dbt tests against the decoded events?
Yes. The events table behaves like any other BigQuery source — schema tests, freshness checks, custom data tests, and dbt-expectations all work exactly as they would against a normal table.
Deploy in under 5 minutes
Your dbt project,
without the UNNEST tax.
Subscribe via Google Cloud Marketplace, point destination_dataset_id at your dbt sources dataset, and have a clean events table available to your models before the end of the day.
Google Cloud Marketplace · Usage-based · No monthly minimum