Summarize this with ChatGPT Claude Perplexity

The GA4 BigQuery export is the right place to start. It is rarely the right place to stay.

If you want to own your GA4 data beyond Google’s retention window, the export is the recommended path, and I have written at length about why the raw export is so painful to query. But once you have a clean, transformed copy of that data, a fair question follows: does it need to live in BigQuery at all?

For a lot of teams, the honest answer is no. Their warehouse is Snowflake. Their lakehouse is Databricks. Their BI engine reads Parquet straight off object storage. Their finance team would quite like the BigQuery storage bill to stop growing forever. And increasingly, the thing consuming the data is not a person with a SQL editor at all — it is an agent, hitting the data through whatever access pattern is cheapest and closest.

So this is the case for getting your GA4 data out of BigQuery and onto cloud storage — GCS, S3, or Azure Blob — as compressed, query-ready Parquet. And how to do it without standing up yet another pipeline to babysit.

Why export to cloud storage at all

Three reasons, in roughly the order people care about them.

Cost. Data in a native BigQuery table accrues both storage and compute charges. Data in Parquet files on object storage, queried through a BigQuery external table, accrues compute when you query it but no BigQuery storage cost at all — you pay the (much lower) object storage rate instead. For multi-TiB GA4 datasets that grow every single day and get queried occasionally, that is a meaningful structural saving, not a rounding error.

Access from systems that don’t speak BigQuery. Athena cannot query a BigQuery table. Neither can Synapse, Redshift Spectrum, or a DuckDB process running on someone’s laptop. They can all read Parquet. The moment your data is sitting in a bucket as open-format files, the list of things that can consume it stops being “BigQuery and its connectors” and starts being “almost everything.”

A portable, format-agnostic copy. Parquet is not a Google format, an AWS format, or a Microsoft format. It is the format. A copy of your GA4 data as partitioned Parquet is a copy you can hand to any tool in your stack, this year or in five years, without a migration project.

That last point matters more every quarter. As more of the data-consumer profile shifts towards agents — reading through multiple systems, via multiple access patterns, often without a human in the loop — a neutral, open copy on object storage stops being a nice-to-have and starts being the sensible default.

Where Decode GA4 fits

Decode GA4 sits between the raw BigQuery export and your downstream systems. It does the structural work the export refuses to do for you — flattening the nested schema, deriving session identifiers, unnesting event parameters into proper columns — and then it manages the export itself: writing hive-partitioned, compressed Parquet files to the cloud storage destination of your choice.

Under the hood it leans on BigQuery’s own EXPORT DATA statement. There is no secret here. What Decode GA4 adds is everything around that statement that turns a one-off EXPORT DATA into a maintainable pipeline: schema handling, incremental loading so each date partition is processed precisely once, and automatic partition detection so that when GA4 quietly amends historical data several days after the fact, the affected partitions are reprocessed and re-exported rather than silently drifting.

The pipeline is the same shape regardless of which cloud you are writing to:

GA4 → BigQuery. Google’s standard daily export lands raw event data into a dataset in your Google Cloud project.
Decode GA4 → Cloud Storage. Transformed data is written as hive-partitioned Parquet files to your chosen bucket or container.
Cloud Storage → Downstream. Your warehouses, query engines, BI tools, and agents read directly from the Parquet files.

Once it is set up and scheduled, steps 2 and 3 run unattended. The only things that trigger work are your scheduler and upstream GA4 schema changes — and the schema changes Decode GA4 handles on its own.

Google Cloud Storage

GCS is the simplest destination, because everything stays inside Google Cloud and there is no cross-cloud authentication to configure. If you are not sure where to send your data, send it here first.

A minimal external-storage install looks like this — point Decode GA4 at your GA4 dataset and a bucket, deploy, and run:

DECLARE options JSON;

SET options = JSON '''
    {
        "ga4_dataset_id": "project_id.analytics_xxxxxxxxx",
        "transform_config_template": "events_external",
        "gcs_bucket_name": "my-ga4-export-bucket"
    }
    ''';

EXECUTE IMMEDIATE (
    SELECT `project_id.decode_ga4_europe_west2.deploy_installer`(options)
);

CALL `project_id.analytics_xxxxxxxxx.install_decode_ga4`();
CALL `project_id.decode_analytics_xxxxxxxxx.RUN`(NULL);

That RUN(NULL) call runs with default options, which is almost always what you want. From the second run onwards it processes incrementally, and with automatic partition detection switched on it will also catch any historical partitions GA4 has modified.

Requirements:

A GCS bucket in a region consistent with your BigQuery dataset
storage.objects.create permission for the service account or user running Decode GA4

Reads happily from: DuckDB, MotherDuck, Snowflake, Databricks, ClickHouse, StarRocks — and, of course, BigQuery itself via an external table.

Full walkthrough in the GCS integration guide and the GCS docs.

Amazon S3

S3 is the right destination when your data platform is AWS-centric — Athena, Redshift Spectrum, or a lakehouse already parked in an AWS region. The data still originates in BigQuery, so the one extra step compared with GCS is configuring cross-cloud IAM so that BigQuery is permitted to write to your S3 bucket. This is a one-time setup, documented step by step, and once it is in place the export behaves exactly as it does for GCS.

Note that writing across clouds means data leaves Google’s network, so there are BigQuery → S3 egress costs to be aware of. For most GA4 volumes these are modest, but they are not zero, and they are worth a quick sanity-check against your expected daily export size before you wire it up.

Requirements:

An S3 bucket in your target AWS region
Cross-cloud IAM configuration granting BigQuery write access to the bucket
Awareness of BigQuery → S3 cross-cloud egress costs

Reads happily from: Amazon Athena, Amazon Redshift Spectrum, Microsoft Fabric, plus DuckDB, MotherDuck, Snowflake, Databricks, ClickHouse, and StarRocks.

Full walkthrough in the Amazon S3 integration guide and the S3 docs.

Azure Blob Storage

Azure Blob Storage follows the same pattern as S3. The additional step is configuring cross-cloud credentials so that BigQuery can write to your storage container, after which the export is identical to the other two destinations. If your reporting lives in Synapse or Microsoft Fabric, this is the path that lands GA4 data right next to it.

The same egress caveat applies: data crossing from Google Cloud to Azure incurs egress charges, so factor that in.

Requirements:

An Azure Blob Storage container in your target region
Cross-cloud credentials granting BigQuery write access to the container
Awareness of BigQuery → Azure cross-cloud egress costs

Reads happily from: Azure Synapse Analytics, Microsoft Fabric, plus DuckDB, MotherDuck, Snowflake, Databricks, ClickHouse, and StarRocks.

Full walkthrough in the Azure Blob integration guide and the Azure docs.

So which one?

It depends — and I mean that as a genuine answer, not a dodge.

If everything you do is in Google Cloud, use GCS. There is no cross-cloud setup, no egress, no extra moving parts. If your warehouse and your bills already live in AWS or Azure, export straight to S3 or Blob so the data lands next to the tools that read it, and accept the modest egress cost as the price of not maintaining a second copy-and-sync step yourself.

The point that holds across all three is the one worth taking away: a single BigQuery utility, deployed once into your own project, can put clean GA4 data in front of more or less any system you run — with no extra platform to buy, no pipeline to monitor, and no SQL boilerplate to rewrite every time GA4 changes upstream.

That is what setting your data free actually looks like. The export gets it into BigQuery. Decode GA4 gets it everywhere else.

Decode GA4 is available on Google Cloud Marketplace. Subscription is free, installation takes a few minutes, and you only pay for the data you process. The documentation covers every destination, configuration option, and the full usage-based pricing model.

Setting your GA4 data free: export to GCS, S3, or Azure

Why export to cloud storage at all

Where Decode GA4 fits

Google Cloud Storage

Amazon S3

Azure Blob Storage

So which one?

Jim Barlow

Ready to set your data free?

Setting your GA4 data free: export to GCS, S3, or Azure

Why export to cloud storage at all

Where Decode GA4 fits

Google Cloud Storage

Amazon S3

Azure Blob Storage

So which one?

Jim Barlow

The GA4 BigQuery export problem

Ready to set your data free?