Export GA4 BigQuery data
to Azure Blob — automatically.
GA4's BigQuery export wasn't designed to leave GCP. Moving raw nested data to Azure Blob costs more and solves less than you expect. Decode GA4 transforms and compresses your data inside GCP first — clean Parquet files, 75–90% smaller, every event parameter as a direct column.
GA4 exports to BigQuery. Most Azure-native data teams want their analytics data in Blob Storage.
Why GA4 data is hard to get into Azure
This gap is more common than it sounds. Your analytics team uses GA4. Your data platform runs on Azure — Synapse for the warehouse, Microsoft Fabric or OneLake for unified analytics, Power BI for dashboards, Databricks on Azure for modelling. GA4's BigQuery export does not provide a native path to get there.
The DIY trap
The teams that do build something themselves typically end up with one of two outcomes. Either they write a Cloud Function that extracts raw, nested GA4 event data and pushes it to Blob Storage — at which point every downstream consumer still has to deal with UNNEST logic in Synapse or complex Spark transformations in Databricks. Or they spend weeks writing transformation SQL first, then stand up a BigQuery export to GCS, then sync GCS to Blob with AzCopy or Azure Data Factory. Four moving parts, indefinitely.
A simpler path: resolve the complexity once
There is a third option. Resolve the structural complexity once, inside BigQuery, and export the clean result directly to Blob Storage. That is what Decode GA4 does.
The hidden cost of raw exports: egress
There is also a cost argument most teams discover after the fact. Standard GCP egress rates run $0.08–0.12 per GB. Moving raw, uncompressed GA4 event data to Azure means paying for every byte of UNNEST overhead — the repeated arrays, the repeated key names, the repeated metadata. Transforming and compressing to Parquet inside GCP before export reduces payload size by 75–90%. For most GA4 deployments that is a material reduction in the monthly transfer bill, not a rounding error.
The traditional approach
Export raw GA4 data to Blob
Write a Cloud Function or Cloud Run job that reads from the GA4 BigQuery export and writes to Blob Storage. Fast to build. The problem: you are moving nested, structurally complex data to Azure. Every analyst querying it via Synapse serverless or Databricks still needs correlated subqueries to extract any event parameter. You have moved the problem, not solved it.
Flatten in BigQuery, sync via ADF
Write transformation SQL to unnest the GA4 data in BigQuery, materialise a clean table, set up a BigQuery export to GCS, then sync GCS to Blob via Azure Data Factory or AzCopy. You now maintain: the unnesting SQL (which breaks when GA4 adds a new parameter), the scheduled query, the GCS export job, and the ADF pipeline. Four moving parts, indefinitely.
Use an ETL platform
Fivetran, Airbyte, or Azure Data Factory connectors extract your GA4 data and load it somewhere. They handle movement but not transformation — you still get the same nested structure at the destination, and now your GA4 data has left your Google Cloud project and passed through a third-party system. Most platforms also charge a monthly minimum regardless of usage.
Decode GA4 vs the common way
| Feature | Decode GA4 | DIY / ETL platform |
|---|---|---|
| Setup time | Under 5 minutes | Hours or days of App Registration and pipeline config |
| Compression | Automatic ZSTD — 75–90% size reduction | Manual scripting, often uncompressed |
| Schema drift | Auto-detected and handled | Manual schema updates when GA4 changes |
| Egress costs | Pre-compressed inside GCP before transfer | Full raw payload transferred — no savings |
| Data residency | Data never leaves your GCP project | Most ETL platforms process your data on their infrastructure |
| Maintenance | Zero — deploy once, run forever | Ongoing SQL updates, pipeline monitoring, schema fixes |
One deployment. Clean data in Blob. Zero maintenance.
- [ 1 ]
Subscribe via Google Cloud Marketplace
Decode GA4 is available on Google Cloud Marketplace. Usage-based pricing — no monthly minimum, no credit card required. The subscription takes under a minute and billing appears on your existing GCP invoice.
- [ 2 ]
Deploy with Blob export configured
The installer takes your GA4 properties, Azure storage account, container name, and the BigQuery connection ID you set up for Azure. The connection setup is a one-time ~5-minute step, and the docs have the exact commands. Decode GA4 itself installs entirely within your GCP project — no data touches any external system except your own Azure storage account.
- [ 3 ]
Clean Parquet files appear in Blob, daily
Decoded GA4 data is written to Blob Storage in compressed Parquet format, hive-partitioned by date. Query it with Synapse serverless, read it with Databricks, pick it up in Power BI via DirectLake, or register it as a OneLake shortcut in Fabric. Each partition is processed once, unless GA4 modifies it upstream — which is detected and handled automatically.
HOW THE SETUP WORKS
Cross-cloud integration, done in four steps. Full commands in the docs.
Azure
Register an Azure AD application and note the Tenant ID and Client ID.
GCP
Create a BigQuery connection pointing at that application's tenant, client ID, and Azure region.
Azure
Add the Google identity BigQuery returns as a federated credential on the application.
Azure
Assign the Storage Blob Data Contributor role to the app on your storage account.
Authentication uses Workload Identity Federation, not stored Azure client secrets. BigQuery exchanges a short-lived Google OIDC token for an Azure AD access token at runtime — no client secrets or connection strings are kept on the GCP side.
What you get in Blob
Clean, flat Parquet files
Every event parameter exposed as a direct column. No UNNEST. No correlated subqueries. Query with Synapse serverless using simple dot-notation, or read in Databricks without Spark schema gymnastics.
Hive-partitioned by date
Data lands in Blob in a standard hive-partitioned folder structure. Synapse OPENROWSET partition pruning works out of the box. So does Databricks Auto Loader, Fabric Lakehouse shortcuts, and most ADLS-aware query engines.
Automatic schema evolution
When GA4 adds a new event parameter, it appears in the next export without any configuration change on your end. The pipeline does not break. You do not get paged at 2am.
Incremental, not full refreshes
Each partition is processed once. If GA4 modifies a historical partition — which it does, unpredictably — Decode GA4 detects the change and reprocesses only that date. Daily runs are cheap.
75–90% lower egress costs
ZSTD-compressed Parquet leaves GCP at a fraction of the size of raw BigQuery exports. Less data transferred means a proportionally smaller egress bill. The compression happens before transfer, inside your project.
The full Azure analytics stack
Flat Parquet in Blob opens Synapse for serverless SQL, Fabric and OneLake for unified analytics, Power BI DirectLake for sub-second dashboards, and Databricks for ML. All from the same clean source files.
Power BI DirectLake against GA4 event data
DirectLake skips Import mode entirely — Power BI queries Parquet files directly from Blob or OneLake, no dataset refresh job, no dataset size limit. Point DirectLake at the decoded GA4 partitions and users get sub-second query performance over years of event-level data without any intermediate modelling layer. The flat column structure means measures and dimensions map directly; no DAX gymnastics for nested parameters.
Historical data preservation
GA4 retains raw event data for 14 months by default. Move it to Blob and you own it indefinitely. Lifecycle policies automatically shift older partitions to Cool, then Archive tier — archive storage runs around £0.002 per GB per month, so years of GA4 event history fit in most teams' petty cash. Standard year-over-year analysis stops being a problem you have to engineer around.
Attribution modelling in Databricks or Fabric
Platform-reported attribution is always last-click or close to it. Clean event-level data in Blob gives your data science team the raw material for proper Markov chain or data-driven models in Databricks or Fabric Data Science — without touching BigQuery quotas. The event parameters are already flat. The path sequences are already queryable without subqueries.
Do I need to store an Azure client secret anywhere?
No. BigQuery authenticates to Azure using Workload Identity Federation. A federated credential on your Azure AD app links BigQuery's Google identity directly — no client secrets, no connection strings, no long-lived credentials on the GCP side. How auth works →
What GCP and Azure permissions do I need to install this?
On GCP, the BigQuery Connection Admin role. On Azure, permission to register an application, add a federated credential, and assign Storage Blob Data Contributor on the target storage account. Both are standard for a data platform owner. See prerequisites →
Which Azure regions are supported?
Any Azure region where BigQuery Omni operates. You set the region when you create the BigQuery connection, and it's fixed per connection — one connection per region if you need multiple destinations. Setup guide →
Deploy in under 5 minutes
Your GA4 data in Azure Blob,
today.
No credit card required. Install via Google Cloud Marketplace, configure your Blob container, and have clean Parquet files appearing in Azure before the end of the day.
Google Cloud Marketplace · Usage-based · No monthly minimum