Export GA4 BigQuery data
to Amazon S3 — automatically.

GA4's BigQuery export wasn't designed to leave GCP. Moving raw nested data to S3 costs more and solves less than you expect. Decode GA4 transforms and compresses your data inside GCP first — clean Parquet files, 75–90% smaller, every event parameter as a direct column.

Category: Cross-cloud export Deploy time: ~5 minutes Format: Parquet (compressed) Runs inside your GCP project
Summarize This ChatGPT Perplexity

GA4 exports to BigQuery. Most AWS-native data teams want their analytics data in S3.

Why GA4 data is hard to get into AWS

This gap is more common than it sounds. Your analytics team uses GA4. Your data platform runs on AWS — Athena for ad-hoc queries, Redshift for the warehouse, SageMaker for modelling. GA4's BigQuery export does not provide a path to get there without building something yourself.

The DIY trap

The teams that do build something themselves typically end up with one of two outcomes. Either they write a Cloud Function that extracts raw, nested GA4 event data and pushes it to S3 — at which point every downstream consumer still has to deal with UNNEST subqueries. Or they spend weeks writing transformation SQL first, then set up a sync job, and end up maintaining two moving parts indefinitely.

A simpler path: resolve the complexity once

There is a third option. Resolve the structural complexity once, inside BigQuery, and export the clean result directly to S3. That is what Decode GA4 does.

The hidden cost of raw exports: egress

There is also a cost argument most teams discover after the fact. Standard GCP egress rates run $0.08–0.12 per GB. Moving raw, uncompressed GA4 event data to S3 means paying for every byte of UNNEST overhead — the repeated arrays, the repeated key names, the repeated metadata. Transforming and compressing to Parquet inside GCP before export reduces payload size by 75–90%. For most GA4 deployments that is a material reduction in the monthly transfer bill, not a rounding error.

Option A

Export raw GA4 data to S3

Write a Cloud Function or Cloud Run job that reads from the GA4 BigQuery export and writes to S3. Fast to build. The problem: you are moving nested, structurally complex data to S3. Every analyst querying it via Athena still needs to write correlated subqueries to extract any event parameter. You have moved the problem, not solved it.

Still nested in S3
Option B

Flatten in BigQuery, then export

Write transformation SQL to unnest the GA4 data in BigQuery, materialise the result as a clean table, then set up a BigQuery export job to GCS, then sync GCS to S3. You now maintain: the unnesting SQL (which breaks when GA4 adds a new parameter), the scheduled query, the GCS export job, and the S3 sync. Four moving parts, indefinitely.

Four things to maintain forever
Option C

Use an ETL platform

Fivetran, Airbyte, or similar extract your GA4 data and load it somewhere. They handle the movement but not the transformation — you still get the same nested structure at the destination, and now your GA4 data has left your Google Cloud project and passed through a third-party system. Most platforms also charge a monthly minimum regardless of usage.

Data leaves your project
Feature Decode GA4 DIY / ETL platform
Setup time Under 5 minutes Hours or days of IAM / VPC config
Compression Automatic ZSTD — 75–90% size reduction Manual scripting, often uncompressed
Schema drift Auto-detected and handled Manual schema updates when GA4 changes
Egress costs Pre-compressed inside GCP before transfer Full raw payload transferred — no savings
Data residency Data never leaves your GCP project Most ETL platforms process your data on their infrastructure
Maintenance Zero — deploy once, run forever Ongoing SQL updates, pipeline monitoring, schema fixes

One deployment. Clean data in S3. Zero maintenance.

  1. [ 1 ]

    Subscribe via Google Cloud Marketplace

    Decode GA4 is available on Google Cloud Marketplace. Usage-based pricing — no monthly minimum, no credit card required. The subscription takes under a minute and billing appears on your existing GCP invoice.

  2. [ 2 ]

    Deploy with S3 export configured

    The installer takes your GA4 properties, S3 bucket, AWS region, and the BigQuery connection ID you set up for AWS. The connection setup is a one-time ~3-minute step, and the docs have the exact commands. Decode GA4 itself installs entirely within your GCP project — no data touches any external system except your own S3 bucket.

  3. [ 3 ]

    Clean Parquet files appear in S3, daily

    Decoded GA4 data is written to S3 in compressed Parquet format, hive-partitioned by date. Query it with Athena, load it into Redshift, read it with SageMaker or DuckDB. Each partition is processed once, unless GA4 modifies it upstream — which is detected and handled automatically.

Cross-cloud integration, done in four steps. Full commands in the docs.

01

AWS

Create an IAM role with S3 write access and Web Identity as the trusted entity.

02

GCP

Create a BigQuery connection pointing at that IAM role's ARN and AWS region.

03

AWS

Add the Google identity BigQuery returns to your role's trust policy.

04

GCP

Run the Decode GA4 installer with your S3 bucket in the config.

Authentication uses Web Identity federation, not long-lived AWS keys. BigQuery assumes your IAM role via OIDC — your AWS credentials never leave AWS.

01

Clean, flat Parquet files

Every event parameter exposed as a direct column. No UNNEST. No correlated subqueries. Query with Athena using simple dot-notation, or load into any tool that reads Parquet.

02

Hive-partitioned by date

Data lands in S3 in a standard hive-partitioned folder structure. Athena partition projection works out of the box. So does Spark, Glue, and most AWS-native query tools.

03

Automatic schema evolution

When GA4 adds a new event parameter, it appears in the next export without any configuration change on your end. The pipeline does not break. You do not get paged at 2am.

04

Incremental, not full refreshes

Each partition is processed once. If GA4 modifies a historical partition — which it does, unpredictably — Decode GA4 detects the change and reprocesses only that date. Daily runs are cheap.

05

75–90% lower egress costs

ZSTD-compressed Parquet leaves GCP at a fraction of the size of raw BigQuery exports. Less data transferred means a proportionally smaller egress bill. The compression happens before transfer, inside your project.

06

The full AWS analytics stack

Flat Parquet in S3 opens Athena for ad-hoc SQL, Redshift for the warehouse, SageMaker for ML, and Databricks Unity Catalog for cross-team governance. All from the same clean source files.

01

Historical data preservation

GA4 retains raw event data for 14 months by default. Move it to S3 and you own it indefinitely. S3 Intelligent-Tiering automatically shifts older partitions to lower-cost storage tiers — Glacier can store years of GA4 event history for under $1 per GB per year. Standard year-over-year analysis stops being a problem you have to engineer around.

02

Attribution modelling

Platform-reported attribution is always last-click or close to it. Clean event-level data in S3 gives your data science team the raw material for proper Markov chain or data-driven models in SageMaker or DuckDB — without touching BigQuery quotas. The event parameters are already flat. The path sequences are already queryable without subqueries.

03

CRM and audience activation

S3 is the standard hub for Reverse ETL. Tools like Fivetran and Windsor.ai can pick up Parquet files from your bucket and sync web behaviour data directly into Salesforce, Braze, or Klaviyo — bridging GA4 event data and CRM records without building a custom pipeline. The data is already clean and flat before it gets there.

Do I need to store AWS access keys anywhere?

No. BigQuery authenticates to AWS using Web Identity federation — a short-lived OIDC token it requests at runtime. Your AWS access keys never exist on the GCP side. How auth works →

What GCP and AWS permissions do I need to install this?

On GCP, the BigQuery Connection Admin role. On AWS, permission to create an IAM role and edit its trust policy. Both are standard for a data platform owner. See prerequisites →

Which AWS regions are supported?

Any standard AWS region. You set the region when you create the BigQuery connection, and it's fixed per connection — one connection per region if you need multiple destinations. Setup guide →

Deploy in under 5 minutes

Your GA4 data in S3,
today.

No credit card required. Install via Google Cloud Marketplace, configure your S3 bucket, and have clean Parquet files appearing in S3 before the end of the day.

Get Started on Marketplace → Read the documentation

Google Cloud Marketplace · Usage-based · No monthly minimum