Decoded GA4 as a clean
Dataform source — without
any UNNEST plumbing.

Dataform sits inside GCP, which is the right place for it. The GA4 export sits inside BigQuery, which is also the right place for it. The problem is that joining the two requires hundreds of lines of UNNEST inside your Dataform repository before you can model anything. Decode GA4 makes the events table behave like any other Dataform source declaration.

Connection: BigQuery external table Runs: native GCP Template: events_external Maintenance: zero schema SQL
Summarize This ChatGPT Perplexity

A Dataform repository that depends on the raw GA4 export is mostly unnesting code. Decode GA4 removes it, and removes a class of GCP IAM headaches at the same time.

The shape of the problem

Dataform is the cleanest way to manage SQL workflows inside BigQuery. It runs natively, it picks up the project service account, and it does not need a separate connection layer. None of that helps with the GA4 export. To pull a single page_location into a Dataform model you still write a correlated subquery against event_params. Multiply by every parameter you care about and you get a sources file that is mostly ceremony.

Why this hurts Dataform projects specifically

Two reasons. First, every model that selects from your unnested staging file gets re-executed when that file changes, and Dataform compilation graphs are unforgiving about what counts as a change. Second, GA4 schema drift hits silently. A new event parameter appears, the staging SQLX does not know about it, and your downstream models keep reporting against an incomplete view of the data. The DAG looks fine. The numbers do not match the GA4 UI.

What changes with Decode GA4

Decode writes a flat events table into a BigQuery dataset of your choice. You declare it in Dataform with a single declaration block and reference it through the standard ${ref()} macro. Your assertions, dependencies, and incremental models all work as normal. There is no UNNEST anywhere in the repository. When GA4 adds a new event parameter, the next decode run picks it up.

Option A

UNNEST inside a Dataform staging file

Write a staging.sqlx that flattens event_params with a CROSS JOIN UNNEST per parameter. Refresh costs scale with the number of parameters. Every new parameter requires a code change. Every analyst who joins the project has to learn the unnesting pattern before they can edit anything.

Hundreds of lines of glue SQL
Option B

Borrow somebody else's GA4 package

Lift staging models from a public dbt-for-GA4 package and translate them into SQLX. They flatten more events than you need, materialise intermediate tables you do not want, and lag behind whatever Google shipped to GA4 last week. You inherit modelling decisions you did not make.

Opinionated, slow to update
Option C

Run a separate flattener before Dataform

Stand up a Cloud Function or Dataflow job to flatten GA4, write to a holding dataset, then point Dataform at that. Now you maintain two pipelines, two failure modes, and two places where the schema can drift. The Dataform graph looks tidy. The stack underneath is a maintenance project.

Two pipelines to maintain
Feature Decode GA4 source Hand-built Dataform staging
Lines of SQL to flatten a page_view3~18 per parameter
New GA4 parameter handlingAuto-detected, appears in sourceManual SQLX update required
Refresh costExternal table, no scan on Dataform runFull UNNEST scan on every run
Onboarding time for analystsTreat events like any other declarationMust learn UNNEST patterns first
Schema drift riskHandled upstreamSilent — staging drops new params
Maintenance over a yearZeroRecurring, every GA4 release

One install. A clean source declaration. Dataform handles the rest.

  1. [ 1 ]

    Subscribe via Google Cloud Marketplace

    Decode GA4 is a Marketplace listing. Usage-based pricing, no monthly minimum. The subscription takes under a minute and billing appears on your existing GCP invoice — the same one Dataform already shows up on.

  2. [ 2 ]

    Deploy with the events_external template

    Pick the BigQuery dataset that your Dataform repository already uses for source declarations — for example dataform_sources. Set destination_dataset_id to that dataset. Decode writes a Parquet-backed external table called events into it.

  3. [ 3 ]

    Declare the source in Dataform

    Add a definitions/sources/events.sqlx file with a config block of type "declaration" pointing at the dataset and table. Reference it in any model with ${ref("events")}. There is no staging UNNEST file to maintain.

  4. [ 4 ]

    Run Dataform as normal

    Dataform compilation picks the declaration up like any other table. Dependencies work. Assertions work. Incremental tables work. The decoded events table updates daily and your Dataform models pick up the new partitions on the next scheduled run.

Wire decoded GA4 into a Dataform repository in four small steps. Nothing here is Dataform-specific magic — it is the same declaration pattern you already use for every other warehouse table.

01

GCP

Run the Decode GA4 installer with destination_dataset_id pointing at your Dataform sources dataset.

02

GCP

Grant the Dataform service account Storage Object Viewer on the Decode GCS bucket.

03

Dataform

Add definitions/sources/events.sqlx with a declaration config block.

04

Dataform

Reference ${ref("events")} in your models. Trigger a Dataform run.

The events table is a BigQuery external table backed by Parquet files in GCS. Dataform reads it natively through BigQuery, but the underlying storage stays in your project. No data leaves your perimeter, and no extra connection has to be configured — Dataform already runs as a service inside the same GCP project.

01

A first-class Dataform source

The events table is a declaration block referenced through ${ref("events")} like any other warehouse table. Dependencies, assertions, and incremental models all work without modification.

02

Direct event parameter columns

page_location, page_referrer, page_title, ga_session_id, ga_session_number — every standard parameter is a direct column. No correlated subqueries inside your SQLX files.

03

External table economics

The source reads from Parquet files in GCS. There is no duplicate storage in BigQuery, and Dataform runs against decoded data without paying to re-scan the raw GA4 export.

04

Schema evolution that just works

When GA4 adds a new event parameter, the next decode run picks it up. Dataform assertions will flag any change you care about, but the run does not fail because of new columns.

05

Cleaner DAGs, fewer staging files

The intermediate flattening SQLX shrinks from hundreds of lines to a thin renaming layer — or disappears entirely. Downstream models become readable.

06

Native GCP, no extra connection

Dataform already runs as a Google Cloud service against BigQuery. The decoded events table is just another dataset in the same project. No extra warehouse credentials, no separate execution environment.

01

Marketing attribution models

Build a session_facts SQLX on top of the decoded events table without writing a single UNNEST. Source-medium, campaign, and landing page are direct columns. The path from raw event to attribution mart shrinks from five intermediate models to one.

02

Funnel and conversion analysis

Standard ecommerce funnels — view_item, add_to_cart, begin_checkout, purchase — become readable case statements over event_name. The downstream funnel model is a window function over a clean source, not a chain of subqueries.

03

Product analytics marts

Custom event parameters that your product team adds — feature flags, plan tier, A/B variant — show up as direct columns the moment they start firing. Dataform models that consume them do not need to be re-shaped to accommodate the new fields.

Does this work with the standalone Dataform CLI as well as Dataform in GCP?

Yes. The integration is at the BigQuery layer — both the open-source CLI and the GCP-managed service see the decoded events table the same way. The declaration block is identical in either environment. See setup →

Do I need to delete my existing GA4 staging SQLX files?

Not immediately. You can run the decoded events declaration alongside an existing staging file and migrate downstream models one at a time. Most teams find that the staging file collapses to a thin renaming layer once Decode is in place.

What permissions does my Dataform service account need?

The standard BigQuery Data Viewer and BigQuery Job User roles, plus Storage Object Viewer on the Decode GA4 GCS bucket — required because the events table is an external table backed by Parquet files. Full prerequisites →

Can I still write Dataform assertions against the decoded events?

Yes. The events table behaves like any other BigQuery source — uniqueness assertions, non-null assertions, row-count checks, and custom SQL assertions all work exactly as they would against a regular table.

Deploy in under 5 minutes

Your Dataform repo,
without the UNNEST tax.

Subscribe via Google Cloud Marketplace, point destination_dataset_id at your Dataform sources dataset, and have a clean events declaration available to your models before the end of the day.

Get Started on Marketplace → Read the documentation

Google Cloud Marketplace · Usage-based · No monthly minimum