Export GA4 BigQuery data
to Amazon S3 — automatically.
GA4's BigQuery export wasn't designed to leave GCP. Moving raw nested data to S3 costs more and solves less than you expect. Decode GA4 transforms and compresses your data inside GCP first — clean Parquet files, 75–90% smaller, every event parameter as a direct column.
GA4 exports to BigQuery. Most AWS-native data teams want their analytics data in S3.
Why GA4 data is hard to get into AWS
This gap is more common than it sounds. Your analytics team uses GA4. Your data platform runs on AWS — Athena for ad-hoc queries, Redshift for the warehouse, SageMaker for modelling. GA4's BigQuery export does not provide a path to get there without building something yourself.
The DIY trap
The teams that do build something themselves typically end up with one of two outcomes. Either they write a Cloud Function that extracts raw, nested GA4 event data and pushes it to S3 — at which point every downstream consumer still has to deal with UNNEST subqueries. Or they spend weeks writing transformation SQL first, then set up a sync job, and end up maintaining two moving parts indefinitely.
A simpler path: resolve the complexity once
There is a third option. Resolve the structural complexity once, inside BigQuery, and export the clean result directly to S3. That is what Decode GA4 does.
The hidden cost of raw exports: egress
There is also a cost argument most teams discover after the fact. Standard GCP egress rates run $0.08–0.12 per GB. Moving raw, uncompressed GA4 event data to S3 means paying for every byte of UNNEST overhead — the repeated arrays, the repeated key names, the repeated metadata. Transforming and compressing to Parquet inside GCP before export reduces payload size by 75–90%. For most GA4 deployments that is a material reduction in the monthly transfer bill, not a rounding error.
The traditional approach
Export raw GA4 data to S3
Write a Cloud Function or Cloud Run job that reads from the GA4 BigQuery export and writes to S3. Fast to build. The problem: you are moving nested, structurally complex data to S3. Every analyst querying it via Athena still needs to write correlated subqueries to extract any event parameter. You have moved the problem, not solved it.
Flatten in BigQuery, then export
Write transformation SQL to unnest the GA4 data in BigQuery, materialise the result as a clean table, then set up a BigQuery export job to GCS, then sync GCS to S3. You now maintain: the unnesting SQL (which breaks when GA4 adds a new parameter), the scheduled query, the GCS export job, and the S3 sync. Four moving parts, indefinitely.
Use an ETL platform
Fivetran, Airbyte, or similar extract your GA4 data and load it somewhere. They handle the movement but not the transformation — you still get the same nested structure at the destination, and now your GA4 data has left your Google Cloud project and passed through a third-party system. Most platforms also charge a monthly minimum regardless of usage.
Decode GA4 vs the common way
| Feature | Decode GA4 | DIY / ETL platform |
|---|---|---|
| Setup time | Under 5 minutes | Hours or days of IAM / VPC config |
| Compression | Automatic ZSTD — 75–90% size reduction | Manual scripting, often uncompressed |
| Schema drift | Auto-detected and handled | Manual schema updates when GA4 changes |
| Egress costs | Pre-compressed inside GCP before transfer | Full raw payload transferred — no savings |
| Data residency | Data never leaves your GCP project | Most ETL platforms process your data on their infrastructure |
| Maintenance | Zero — deploy once, run forever | Ongoing SQL updates, pipeline monitoring, schema fixes |
One deployment. Clean data in S3. Zero maintenance.
- [ 1 ]
Subscribe via Google Cloud Marketplace
Decode GA4 is available on Google Cloud Marketplace. Usage-based pricing — no monthly minimum, no credit card required. The subscription takes under a minute and billing appears on your existing GCP invoice.
- [ 2 ]
Deploy with S3 export configured
The installer takes your GA4 properties, S3 bucket, AWS region, and the BigQuery connection ID you set up for AWS. The connection setup is a one-time ~3-minute step, and the docs have the exact commands. Decode GA4 itself installs entirely within your GCP project — no data touches any external system except your own S3 bucket.
- [ 3 ]
Clean Parquet files appear in S3, daily
Decoded GA4 data is written to S3 in compressed Parquet format, hive-partitioned by date. Query it with Athena, load it into Redshift, read it with SageMaker or DuckDB. Each partition is processed once, unless GA4 modifies it upstream — which is detected and handled automatically.
HOW THE SETUP WORKS
Cross-cloud integration, done in four steps. Full commands in the docs.
AWS
Create an IAM role with S3 write access and Web Identity as the trusted entity.
GCP
Create a BigQuery connection pointing at that IAM role's ARN and AWS region.
AWS
Add the Google identity BigQuery returns to your role's trust policy.
GCP
Run the Decode GA4 installer with your S3 bucket in the config.
Authentication uses Web Identity federation, not long-lived AWS keys. BigQuery assumes your IAM role via OIDC — your AWS credentials never leave AWS.
What you get in S3
Clean, flat Parquet files
Every event parameter exposed as a direct column. No UNNEST. No correlated subqueries. Query with Athena using simple dot-notation, or load into any tool that reads Parquet.
Hive-partitioned by date
Data lands in S3 in a standard hive-partitioned folder structure. Athena partition projection works out of the box. So does Spark, Glue, and most AWS-native query tools.
Automatic schema evolution
When GA4 adds a new event parameter, it appears in the next export without any configuration change on your end. The pipeline does not break. You do not get paged at 2am.
Incremental, not full refreshes
Each partition is processed once. If GA4 modifies a historical partition — which it does, unpredictably — Decode GA4 detects the change and reprocesses only that date. Daily runs are cheap.
75–90% lower egress costs
ZSTD-compressed Parquet leaves GCP at a fraction of the size of raw BigQuery exports. Less data transferred means a proportionally smaller egress bill. The compression happens before transfer, inside your project.
The full AWS analytics stack
Flat Parquet in S3 opens Athena for ad-hoc SQL, Redshift for the warehouse, SageMaker for ML, and Databricks Unity Catalog for cross-team governance. All from the same clean source files.
Historical data preservation
GA4 retains raw event data for 14 months by default. Move it to S3 and you own it indefinitely. S3 Intelligent-Tiering automatically shifts older partitions to lower-cost storage tiers — Glacier can store years of GA4 event history for under $1 per GB per year. Standard year-over-year analysis stops being a problem you have to engineer around.
Attribution modelling
Platform-reported attribution is always last-click or close to it. Clean event-level data in S3 gives your data science team the raw material for proper Markov chain or data-driven models in SageMaker or DuckDB — without touching BigQuery quotas. The event parameters are already flat. The path sequences are already queryable without subqueries.
CRM and audience activation
S3 is the standard hub for Reverse ETL. Tools like Fivetran and Windsor.ai can pick up Parquet files from your bucket and sync web behaviour data directly into Salesforce, Braze, or Klaviyo — bridging GA4 event data and CRM records without building a custom pipeline. The data is already clean and flat before it gets there.
Do I need to store AWS access keys anywhere?
No. BigQuery authenticates to AWS using Web Identity federation — a short-lived OIDC token it requests at runtime. Your AWS access keys never exist on the GCP side. How auth works →
What GCP and AWS permissions do I need to install this?
On GCP, the BigQuery Connection Admin role. On AWS, permission to create an IAM role and edit its trust policy. Both are standard for a data platform owner. See prerequisites →
Which AWS regions are supported?
Any standard AWS region. You set the region when you create the BigQuery connection, and it's fixed per connection — one connection per region if you need multiple destinations. Setup guide →
Deploy in under 5 minutes
Your GA4 data in S3,
today.
No credit card required. Install via Google Cloud Marketplace, configure your S3 bucket, and have clean Parquet files appearing in S3 before the end of the day.
Google Cloud Marketplace · Usage-based · No monthly minimum