Export GA4 BigQuery data
to Google Cloud Storage — automatically.
GCS is the simplest Decode GA4 destination. Same cloud, same region, no cross-cloud credentials, no egress charges. Decoded GA4 lands as Parquet in your bucket — queryable via BigQuery external tables, eligible for archive tiers, shareable across projects without duplicating storage.
GA4 exports to BigQuery. Most analytics work on that data belongs somewhere cheaper, more portable, and easier to share.
Why keep decoded GA4 in GCS
BigQuery is a query engine. It is not the most cost-efficient place to hold years of raw event history, and it is not the easiest surface for sharing the same dataset across multiple teams, projects, or tools. Parquet in GCS solves both. You keep the data cheap, you keep it in one place, and you can still query it from BigQuery via external tables as if it were native. You can also read it with Dataflow, DuckDB, Spark, or any Parquet-aware tool — no export step, no duplication.
The DIY trap
The teams that decide to build something themselves typically end up with one of three outcomes. They either live with UNNEST subqueries forever — which is tolerable for ad-hoc work but adds friction to every dashboard, every downstream model, and every new analyst onboarding. Or they maintain a hand-rolled flattener in BigQuery as scheduled queries, which breaks when GA4 introduces a new event parameter and which pays BigQuery storage prices for data that could live on GCS for a fraction of the cost. Or they wire up a BigQuery EXPORT DATA job to write raw events to GCS, which produces nested Parquet that downstream consumers still have to unpack.
A simpler path: resolve the complexity once
There is a fourth option. Resolve the structural complexity once, inside BigQuery, and write the clean, flat result directly to GCS as compressed Parquet. That is what Decode GA4 does — and for GCS specifically, the setup involves almost nothing beyond granting write access on a bucket.
Cheaper storage, broader access
There is a cost argument most teams discover only after their BigQuery bill grows. Materialised flat tables in BigQuery are convenient but expensive for cold data. The same data as compressed Parquet in GCS costs a fraction of that — and lifecycle rules can move old partitions to Nearline, Coldline, or Archive, dropping the price to roughly $0.0012 per GB per month. A terabyte of GA4 event history on Archive tier runs around a dollar a month. Meanwhile BigQuery external tables let you query it with the same SQL, no loading step, no storage charge on the BigQuery side.
The traditional approach
Live with UNNEST forever
Accept that every dashboard query, every dbt model, every ad-hoc investigation starts with a block of UNNEST subqueries to get at page_location or session_id. It works. It is also the reason new analysts take weeks to become productive on GA4 data and the reason every downstream artefact carries the same boilerplate.
Maintain your own flattener
Write transformation SQL that unnests GA4 events into clean flat tables, wrap it in a BigQuery scheduled query, and store the result as materialised tables. It works until GA4 adds a new event parameter, at which point the pipeline silently drops data or breaks outright. You also pay BigQuery storage rates for data that could live on GCS at a fraction of the cost.
Raw EXPORT DATA to GCS
Set up a BigQuery EXPORT DATA statement to write the raw events table to GCS as Parquet or JSON. You get files in a bucket. You do not get flat data. Every downstream consumer querying the export — Dataflow, external tables, DuckDB — still has to work through the nested structure. You have moved the problem, not solved it.
Decode GA4 vs the common way
| Feature | Decode GA4 | DIY / manual setup |
|---|---|---|
| Setup time | Under 5 minutes | Hours of SQL plus scheduled-query maintenance |
| Compression | Automatic ZSTD — 75–90% size reduction | Manual, or uncompressed raw export |
| Schema drift | Auto-detected and handled | Manual schema updates when GA4 changes |
| Storage cost | Compressed Parquet on GCS — Nearline / Coldline / Archive tiers apply | Materialised in BigQuery — pays BQ storage rates for cold data |
| Query access | BigQuery external tables + every Parquet reader | BigQuery native only |
| Maintenance | Zero — deploy once, run forever | Ongoing SQL updates, pipeline monitoring, schema fixes |
One deployment. Clean data in GCS. Zero maintenance.
- [ 1 ]
Subscribe via Google Cloud Marketplace
Decode GA4 is available on Google Cloud Marketplace. Usage-based pricing — no monthly minimum, no credit card required. The subscription takes under a minute and billing appears on your existing GCP invoice.
- [ 2 ]
Deploy with GCS bucket configured
The installer takes your GA4 properties and the name of the GCS bucket you want to write to. No cross-cloud credentials, no BigQuery connection to set up — it is standard GCP IAM. Decode GA4 installs entirely within your project and writes directly to your bucket.
- [ 3 ]
Clean Parquet files appear in GCS, daily
Decoded GA4 data is written to GCS in compressed Parquet format, hive-partitioned by date. Query it via BigQuery external tables, read it with Dataflow or DuckDB, share it across projects with IAM, or tier it to Coldline or Archive for cheap long-term storage. Each partition is processed once, unless GA4 modifies it upstream — which is detected and handled automatically.
HOW THE SETUP WORKS
Native GCP export, done in two steps. No cross-cloud configuration.
GCP
Create (or identify) a GCS bucket in your project, ideally in the same region as your BigQuery dataset.
GCP
Grant the installer's identity Storage Object User on that bucket, or Storage Object Viewer plus Storage Object Creator.
Authentication uses standard Google Cloud IAM. No service-account keys to manage, no cross-cloud federation, no tokens to rotate. Decode GA4 runs under your project's service account and writes directly to your bucket.
What you get in GCS
Clean, flat Parquet files
Every event parameter exposed as a direct column. No UNNEST. No correlated subqueries. Query with BigQuery external tables using simple dot-notation, or read with any tool that speaks Parquet.
Hive-partitioned by date
Data lands in GCS in a standard hive-partitioned folder structure. BigQuery external tables pick up the partition scheme automatically. So does Dataflow, DuckDB, Spark, and every mainstream Parquet reader.
Automatic schema evolution
When GA4 adds a new event parameter, it appears in the next export without any configuration change on your end. The pipeline does not break. You do not get paged at 2am.
Incremental, not full refreshes
Each partition is processed once. If GA4 modifies a historical partition — which it does, unpredictably — Decode GA4 detects the change and reprocesses only that date. Daily runs are cheap.
Zero egress
GCS sits in the same cloud as BigQuery. Writing the compressed Parquet from your BigQuery project to your GCS bucket incurs no egress charges when the two are in the same region. Storage is the only line item.
Queryable from everywhere
Parquet in GCS opens BigQuery external tables for SQL, BigLake for row-level security, Dataflow and DuckDB for heavy processing, and straightforward IAM-based sharing across projects and teams.
BigQuery external tables on decoded GA4
Create a BigQuery external table pointing at your GCS bucket, and the decoded Parquet queries back exactly like a native table — no loading, no duplication, no storage bill on the BigQuery side. This is the most common Decode GA4 pattern: clean data at GCS prices, full SQL ergonomics from BigQuery. Adding BigLake on top gives you row-level security, column masking, and fine-grained access control if you need it.
Cheap historical archive with storage classes
GCS lifecycle rules move older partitions automatically from Standard to Nearline to Coldline to Archive. Archive tier is around $0.0012 per GB per month, which makes multi-terabyte GA4 event histories cost roughly a coffee per month. Decoded Parquet keeps the queryable structure even when the data is cold, so restoring a year-old partition for ad-hoc analysis does not require pipeline work.
Cross-team and cross-project sharing
Grant IAM access on the bucket and any team or project in your organisation can read the same decoded GA4 dataset without duplicating storage or re-running the transformation. Analytics, data science, and finance can each point their own external tables, Dataflow jobs, or DuckDB processes at the same files. One source of truth, many consumers, no pipeline fan-out.
What IAM permissions does the installer need on the bucket?
Storage Object User on the target bucket, or the equivalent pair of Storage Object Viewer plus Storage Object Creator. Standard Google Cloud IAM — no custom roles, no service-account key management. See prerequisites →
Does my GCS bucket need to be in a specific region?
Keep the bucket in the same region as your BigQuery dataset to avoid same-cloud egress charges. Cross-region is supported but pays per-GB transfer between regions. Most teams use a single-region bucket matching the GA4 dataset. Setup guide →
Can I query the Parquet files directly from BigQuery without loading them?
Yes. BigQuery external tables read Parquet in GCS in place, no load step, no duplicated storage. This is the most common pattern — you keep the data cheap on GCS and query it from BigQuery as if it were native. BigLake wraps the same files with governance if you need row-level security or column masking.
Deploy in under 5 minutes
Your GA4 data in GCS,
today.
No credit card required. Install via Google Cloud Marketplace, point the installer at your GCS bucket, and have clean Parquet files appearing in GCS before the end of the day.
Google Cloud Marketplace · Usage-based · No monthly minimum