Decoded GA4 as a stable
SQLMesh external model —
no schema-drift surprises.

SQLMesh's whole value proposition is safe deploys: virtual environments, column-level lineage, breaking-change detection. None of that helps when the source itself reshapes itself every few weeks. The raw GA4 export does exactly that. Decode GA4 turns the events table into a flat, predictable external model so SQLMesh can do what it is designed to do.

Connection: BigQuery external table Adapter: sqlmesh[bigquery] Template: events_external Maintenance: zero schema SQL
Summarize This ChatGPT Perplexity

SQLMesh promises safe, virtual deploys. Schema drift in the GA4 export breaks that promise. Decoding the source upstream restores it.

The shape of the problem

The GA4 BigQuery export stores every event parameter inside a repeated record. To reference page_location in a SQLMesh model you write a correlated subquery against event_params. Multiply that across every parameter your team cares about and you end up with a base model that is several hundred lines of unnesting before any actual transformation happens. SQLMesh's incremental models, virtual data environments, and breaking-change detection still work, but they are working very hard to compensate for a source that should not have looked like this in the first place.

Why this hurts SQLMesh projects specifically

Two reasons. First, virtual environments assume the source schema is stable enough to compare two model versions side by side. When the underlying event_params record sprouts a new key, the comparison goes one of two ways: the upstream UNNEST silently drops the column, or your CI catches a breaking change that is really an upstream surprise. Either way SQLMesh ends up apologising for someone else's instability. Second, the audit checks SQLMesh runs against your sources are designed for tables, not for nested-record acrobatics — they get noisy fast.

What changes with Decode GA4

Decode writes a flat events table into a BigQuery dataset of your choice. You declare it in external_models.yaml as an external model. SQLMesh treats it like any other source, and your transformations reference event parameters as direct columns. Schema evolution lives outside SQLMesh entirely — when GA4 adds a new event parameter, it appears in the next decode run and shows up in the next external model refresh. SQLMesh's virtual environments and breaking-change detection are then operating on a source that does not change shape behind their back.

Option A

UNNEST inside a SQLMesh base model

Write a base model that flattens event_params with a CROSS JOIN UNNEST per parameter. Every refresh re-scans the raw export. Every new GA4 parameter requires a code change. Your virtual data environment now contains a model whose entire purpose is to compensate for the source.

Hundreds of lines of glue SQL
Option B

Translate a public GA4 package

Lift staging models from a community dbt-for-GA4 package and rewrite them in SQLMesh. They flatten more events than you need, materialise intermediate tables, and lag behind whatever Google last changed. You inherit modelling decisions you did not make.

Opinionated, slow to update
Option C

Run a separate flattener before SQLMesh

Stand up Airbyte, Cloud Functions, or a Python job to flatten GA4 and write back to BigQuery, then point SQLMesh at that. Two pipelines, two failure modes, two places where the schema can drift. SQLMesh's lineage view looks clean. The stack underneath has just become harder to reason about.

Two pipelines to maintain
Feature Decode GA4 external model Hand-built SQLMesh base model
Lines of SQL to flatten a page_view3~18 per parameter
New GA4 parameter handlingAuto-detected, appears in external modelManual SQL update required
Refresh costExternal table, no scan on plan/applyFull UNNEST scan on every backfill
Virtual environment safetySource schema stable across versionsSchema drift breaks comparison
Onboarding time for engineersTreat events like any other external modelMust learn UNNEST patterns first
Maintenance over a yearZeroRecurring, every GA4 release

One install. A clean external model. SQLMesh handles the rest.

  1. [ 1 ]

    Subscribe via Google Cloud Marketplace

    Decode GA4 is a Marketplace listing. Usage-based pricing, no monthly minimum. The subscription takes under a minute and billing appears on your existing GCP invoice.

  2. [ 2 ]

    Deploy with the events_external template

    Pick a BigQuery dataset that SQLMesh will use as a source — for example sqlmesh_sources. Set destination_dataset_id to that dataset. Decode writes a Parquet-backed external table called events into it.

  3. [ 3 ]

    Install sqlmesh[bigquery] and declare the external model

    pip install "sqlmesh[bigquery]". Add the table to external_models.yaml as your-gcp-project-id.sqlmesh_sources.events. SQLMesh now knows the schema without you having to enumerate every column.

  4. [ 4 ]

    Run sqlmesh plan as normal

    Models reference the external model with a regular FROM clause. Virtual environments, audit checks, and breaking-change detection all work against a source that does not silently change shape. The decoded events table updates daily and your SQLMesh runs pick up the new partitions on the next plan/apply.

Wire decoded GA4 into a SQLMesh project in four small steps. Nothing here is SQLMesh-specific magic — it is the same external model pattern you already use for any source SQLMesh does not own.

01

GCP

Run the Decode GA4 installer with destination_dataset_id pointing at your SQLMesh sources dataset.

02

GCP

Grant the BigQuery service account Storage Object Viewer on the Decode GCS bucket.

03

SQLMesh

Configure config.yaml gateway and add the events table to external_models.yaml.

04

SQLMesh

Reference your-gcp-project-id.sqlmesh_sources.events in models. Run sqlmesh plan.

The events table is a BigQuery external table backed by Parquet files in GCS. SQLMesh reads it natively through the BigQuery adapter, but the underlying storage stays in your project. No data leaves your perimeter, and there is no duplicate dataset for SQLMesh to keep in sync.

01

A first-class external model

The events table is declared once in external_models.yaml and referenced through a normal FROM clause. Lineage, audit checks, and breaking-change detection all see a stable schema.

02

Direct event parameter columns

page_location, page_referrer, page_title, ga_session_id, ga_session_number — every standard parameter is a direct column. No correlated subqueries inside your SQLMesh models.

03

External table economics

The source reads from Parquet files in GCS. There is no duplicate storage in BigQuery, and SQLMesh runs against decoded data without paying to scan the raw GA4 export.

04

Virtual environments that stay safe

Schema evolution is handled upstream. SQLMesh's virtual environments compare model versions against a source that does not reshape itself between plan and apply. Breaking-change detection becomes signal rather than noise.

05

Cleaner DAGs, fewer base models

The base model that used to flatten events shrinks from hundreds of lines of UNNEST to a thin renaming layer — or disappears. Downstream models become readable.

06

Works with open-source SQLMesh and Tobiko Cloud

The setup is identical whether you run open-source SQLMesh, Tobiko Cloud, or any orchestrator on top. pip install "sqlmesh[bigquery]", point at the dataset, and you are done. No platform-specific features required.

01

Marketing attribution models with safe rollbacks

Build a session_facts model on top of the decoded events external model. Source-medium, campaign, and landing page are direct columns. SQLMesh's virtual environment lets you preview changes against production data without touching the live tables — and the source does not move under your feet while you do it.

02

Funnel and conversion analysis with audits

Standard ecommerce funnels — view_item, add_to_cart, begin_checkout, purchase — become readable case statements over event_name. SQLMesh audits run against a stable schema, so unique-per-session and not-null checks behave predictably across deploys.

03

Product analytics marts with breaking-change detection

Custom event parameters that your product team adds — feature flags, plan tier, A/B variant — show up as direct columns the moment they start firing. Breaking-change detection catches genuine model-level breakages, not source-shape surprises.

Does this work with open-source SQLMesh or Tobiko Cloud?

Both. The integration is at the BigQuery adapter layer — open-source SQLMesh, Tobiko Cloud, and any orchestrator that drives SQLMesh see the decoded events table the same way. No cloud-only features required. See setup →

How does this interact with SQLMesh's virtual environments?

Cleanly. The decoded events table is an external model, so SQLMesh does not try to version it. Your transformations get virtual environments and breaking-change detection. The source itself is treated as a stable external table that updates daily.

What permissions does my BigQuery service account need?

The standard BigQuery Data Viewer and BigQuery Job User roles, plus Storage Object Viewer on the Decode GA4 GCS bucket — required because the events table is an external table backed by Parquet files. Full prerequisites →

Can I still write SQLMesh audits against the decoded events?

Yes. The events table behaves like any other BigQuery external model — built-in audits like unique_values, not_null, and accepted_values all work, as do custom SQL audits.

Deploy in under 5 minutes

Your SQLMesh project,
without the schema-drift tax.

Subscribe via Google Cloud Marketplace, point destination_dataset_id at your SQLMesh sources dataset, and have a stable external model available to your transformations before the end of the day.

Get Started on Marketplace → Read the documentation

Google Cloud Marketplace · Usage-based · No monthly minimum