Summarize this with ChatGPT Claude Perplexity

I have been a data engineer for close to two decades. I have worked with messy data. I have worked with beautiful data. And in 2023, along with every other data professional on the planet, I started working with the Google Analytics 4 BigQuery export.

It was a step backwards.

Not in what the data contains. The data is fine. The step backwards is structural. GA4 exports data in a deeply nested format, with event parameters and user properties buried inside repeated record fields. To get anything useful out of it, you write SQL that looks like this:

SELECT
  event_timestamp,
  (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS page_location,
  (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_title')    AS page_title
FROM `project.analytics_xxxxxxxxx.events_*`
WHERE _TABLE_SUFFIX BETWEEN 'YYYYMMDD' AND 'YYYYMMDD'
  AND event_name = 'page_view'

Three fields. Eighteen lines of SQL. One of those fields is a table-level metadata filter.

You write that pattern for every parameter. You write it for every event type. You write it for every new analyst who joins the team. GA4 changes upstream and the pattern changes with it. You do it, and then you do it again, and then you do it a third time.

For context

The structure is not accidental. GA4 inherits its nested schema from the Firebase Analytics model, which was designed for mobile event tracking at scale. When Google merged Firebase and web analytics into GA4 in 2020, and made the BigQuery export freely available to everyone in 2023, the same nested structure came with it.

The trade-off made sense from a warehousing perspective. Event parameters are open-ended, so storing them as repeated records means GA4 can add fields without schema migrations. The cost is pushed downstream to anyone trying to read the data.

For a small team with one website and a handful of reports, that cost is tolerable. For an agency managing fifty clients, or a company with multiple GA4 properties and a small analytics team, it compounds fast.

Why this matters

I used to think the UNNEST pattern was just an SQL annoyance. It is worse than that.

Every correlated subquery forces BigQuery to scan every event row, even when the query only cares about a subset. Compute costs are higher than they should be. Storage costs are higher than they should be, because the wildcard date-sharded tables are not partitioned the way BigQuery prefers.

And when GA4 modifies historical data, which it does, quietly, several days after the fact, nothing in your own pipeline tells you about it. The numbers just shift and the dashboards slowly drift away from the GA4 interface.

The typical response is to build a transformation layer. dbt models, Dataform workflows, scheduled queries, Dataflow jobs. Pick your tool. All of them solve the immediate problem by writing the same painful SQL once, storing the result, and querying the result from then on.

That works. But now you have a repository of SQL to maintain, a pipeline to monitor, a schedule to configure, a service account to manage, and a re-processing strategy to design. You also need a schema evolution strategy, or more often, a lack of one that breaks the next time GA4 adds a parameter.

For one GA4 property, this is acceptable. For many, it is a lot of infrastructure to maintain just to get clean analytics data out of a tool you already pay nothing extra for.

What we built

Decode GA4 is a BigQuery-native utility that takes the GA4 export and produces a clean, flat table. You call a single function. Every parameter becomes a direct column. The same query as above, against decoded data, looks like this:

SELECT
  event_timestamp,
  event_param.page_location,
  event_param.page_title
FROM `project.decode_analytics_xxxxxxxxx.events`
WHERE event_name = 'page_view'

Three fields. Three lines. No UNNEST.

The transformation uses dynamic SQL generated by query-builder functions, deployed into your own BigQuery project via Analytics Hub. Your data never leaves your Google Cloud project. The only thing that crosses the boundary is a JSON configuration payload.

Output goes to either a BigQuery partitioned table or compressed Parquet files on Google Cloud Storage, which you can expose as an external table. The Parquet option is what we recommend. It handles schema changes gracefully, it compresses well, and it can be read by AWS S3 and Azure Blob Storage consumers without any additional work.

What changes when you install it

For analysts: event_param.page_location instead of a correlated subquery. Tables work like tables.

For analytics engineers: clean source tables that drop straight into dbt or Dataform workflows without wrapping every incremental model in UNNEST boilerplate.

For data engineers: no transformation pipeline to maintain. Schema evolution is automatic. Historical modifications are detected and reprocessed on the next run. Deploy once, then ignore.

For whoever pays the BigQuery bill: around 40 to 70 percent lower combined compute and storage costs compared with querying the raw export, in our own testing across small and large properties. Your numbers will vary.

Pricing is usage-based, billed through Google Cloud Marketplace on your existing Cloud invoice. For sites under ten thousand monthly sessions the typical cost is three to eight dollars a month. No credit card required to install. No minimum commitment.

How to try it

Decode GA4 is on Google Cloud Marketplace. Subscription is free. Search for “GA4” in the Marketplace, click on the Decode GA4 listing, and subscribe. Install takes about five minutes.

After installation, a RUN function appears in a new dataset alongside your GA4 export. Calling it with a NULL argument runs with default options, which is usually all you need.

CALL my_project.decode_analytics_xxxxxxxxx.RUN(NULL)

The documentation covers configuration options, automation patterns, and the full pricing model.

A closing thought

In five years nobody will write SQL like the GA4 export requires. Models will generate the queries. Semantic layers will handle the intent. The UNNEST pattern will be a footnote in a technical post like this one.

Until then we still have to get the data out. Decode GA4 is how we do it.

The GA4 BigQuery export problem

For context

Why this matters

What we built

What changes when you install it

How to try it

A closing thought

Jim Barlow

Ready to set your data free?

The GA4 BigQuery export problem

For context

Why this matters

What we built

What changes when you install it

How to try it

A closing thought

Jim Barlow

Setting your GA4 data free: export to GCS, S3, or Azure

Ready to set your data free?