Decoded GA4 as a clean
Bruin source — flat YAML,
flat columns, no UNNEST.

Bruin's pitch is simple: define SQL and Python assets as YAML, manage dependencies, run against BigQuery. The cleanness of that model falls apart the moment your source is the GA4 export, because every asset that touches it has to flatten event_params first. Decode GA4 keeps Bruin honest. The decoded events table reads like a regular BigQuery source, and your assets stay short.

Connection: BigQuery external table CLI: bruin Template: events_external Maintenance: zero schema SQL
Summarize This ChatGPT Perplexity

Bruin's appeal is short YAML and short SQL. The GA4 export breaks both of those if you point Bruin at it directly. Decoding upstream is what makes the YAML-first model actually work for GA4.

The shape of the problem

The GA4 BigQuery export stores every event parameter inside a repeated record. To pull a single page_location into a Bruin SQL asset you write a correlated subquery against event_params. To get device category, another. To get user pseudo id, another. The asset SQL grows long, the YAML metadata stays short, and the imbalance is awkward — Bruin is supposed to be the place where the SQL stays small and the orchestration stays declarative.

Why this hurts Bruin projects specifically

Two reasons. First, Bruin's AI-assisted authoring works far better when source columns are already named clearly. UNNEST scaffolding is the worst possible context for an LLM-assisted edit. Second, Bruin's strength is dependency management across SQL and Python assets. Pointing those dependencies at a source that silently drops new GA4 parameters means downstream Python assets can run successfully against incomplete data, and you only find out when the dashboard numbers stop matching the GA4 UI.

What changes with Decode GA4

Decode writes a flat events table into a BigQuery dataset of your choice. You add a BigQuery connection to .bruin.yml, then create assets that select directly from your-gcp-project-id.bruin_sources.events. There is no UNNEST in any asset, anywhere. When GA4 adds a new event parameter, the next decode run picks it up and your downstream assets see it on the next bruin run.

Option A

UNNEST inside a Bruin SQL asset

Write a base events asset with a CROSS JOIN UNNEST per parameter. Refresh costs scale with the number of parameters. Every new GA4 parameter requires a code change. Bruin's promise of small, focused SQL assets goes out the window for the one source most analytics work depends on.

Hundreds of lines of glue SQL
Option B

Adapt a public GA4 package

Lift staging models from a community dbt-for-GA4 package and rewrite them as Bruin SQL assets. They flatten more events than you need, materialise intermediate tables, and lag behind whatever Google last shipped. You inherit modelling decisions you did not make and a release cadence you do not control.

Opinionated, slow to update
Option C

Run a separate flattener before Bruin

Stand up Cloud Functions or a Python job to flatten GA4 and write back to BigQuery, then point Bruin at that. Two pipelines, two failure modes, two places where the schema can drift. Bruin's dependency graph looks tidy. The thing it depends on is now a separate maintenance project.

Two pipelines to maintain
Feature Decode GA4 source Hand-built Bruin staging asset
Lines of SQL to flatten a page_view3~18 per parameter
New GA4 parameter handlingAuto-detected, appears in sourceManual SQL update required
Refresh costExternal table, no scan on bruin runFull UNNEST scan on every refresh
YAML asset readabilityShort SQL, clean dependenciesLong SQL, noisy diffs
AI-assisted authoring qualityClear column names, predictable shapeUNNEST scaffolding confuses suggestions
Maintenance over a yearZeroRecurring, every GA4 release

One install. A clean BigQuery source. Bruin handles the rest.

  1. [ 1 ]

    Subscribe via Google Cloud Marketplace

    Decode GA4 is a Marketplace listing. Usage-based pricing, no monthly minimum. The subscription takes under a minute and billing appears on your existing GCP invoice.

  2. [ 2 ]

    Deploy with the events_external template

    Pick a BigQuery dataset that Bruin will use as a source — for example bruin_sources. Set destination_dataset_id to that dataset. Decode writes a Parquet-backed external table called events into it.

  3. [ 3 ]

    Add a BigQuery connection to .bruin.yml

    Use Application Default Credentials or a service account file. Bruin now knows how to talk to the project. No additional warehouse setup is needed for the decoded events table — it is already a normal BigQuery table from Bruin's perspective.

  4. [ 4 ]

    Run bruin run as normal

    Create SQL assets that select from your-gcp-project-id.bruin_sources.events. Bruin resolves dependencies, runs assets in order, and reports status. The decoded events table updates daily and your assets pick up the new partitions on the next run.

Wire decoded GA4 into a Bruin pipeline in four small steps. Nothing here is Bruin-specific magic — it is the same source pattern you already use for any other BigQuery table.

01

GCP

Run the Decode GA4 installer with destination_dataset_id pointing at your Bruin sources dataset.

02

GCP

Grant the BigQuery service account or user Storage Object Viewer on the Decode GCS bucket.

03

Bruin

Install the Bruin CLI and add a BigQuery connection block to .bruin.yml.

04

Bruin

Create SQL assets that select from your-gcp-project-id.bruin_sources.events. Run bruin run.

The events table is a BigQuery external table backed by Parquet files in GCS. Bruin reads it natively through BigQuery, but the underlying storage stays in your project. No data leaves your perimeter, and there is no extra ingestion step for Bruin to schedule.

01

A first-class BigQuery source

The events table is referenced through a normal FROM clause inside any SQL asset. Bruin's dependency resolver picks it up like any other source. No special handling required.

02

Direct event parameter columns

page_location, page_referrer, page_title, ga_session_id, ga_session_number — every standard parameter is a direct column. No correlated subqueries inside your SQL assets.

03

External table economics

The source reads from Parquet files in GCS. There is no duplicate storage in BigQuery, and Bruin runs against decoded data without paying to scan the raw GA4 export.

04

Schema evolution that just works

When GA4 adds a new event parameter, the next decode run picks it up. Bruin assets that need the new column can reference it immediately. Assets that do not are unaffected.

05

Short SQL, clean YAML

Asset SQL stays short and readable. Bruin's YAML metadata stays declarative. The diff between two pipeline versions stops being a wall of UNNEST changes and starts being actual business logic.

06

Better AI-assisted authoring

Bruin's AI-assisted authoring works far better against clearly named columns than against nested-record UNNEST patterns. Suggestions stop trying to recreate scaffolding you no longer need.

01

Marketing attribution pipelines

Build a session_facts SQL asset on top of the decoded events table without writing a single UNNEST. Source-medium, campaign, and landing page are direct columns. The path from raw event to attribution mart shrinks from five intermediate assets to one.

02

Funnel analysis with Python downstream

Standard ecommerce funnels — view_item, add_to_cart, begin_checkout, purchase — become readable case statements in the SQL asset. The Python asset that runs cohort analysis on top reads from a clean intermediate table rather than re-doing the flattening.

03

Product analytics marts driven by YAML

Custom event parameters that your product team adds — feature flags, plan tier, A/B variant — show up as direct columns the moment they start firing. New asset YAML stays small. New asset SQL stays small. The whole pipeline scales as the data does, not as the unnesting does.

Does this work with the open-source Bruin CLI?

Yes. The integration is at the BigQuery layer — the open-source CLI sees the decoded events table the same way any BigQuery client would. No platform-specific features required. See setup →

Do I need to delete my existing GA4 staging assets?

Not immediately. You can run the decoded events table alongside an existing flattening asset and migrate downstream assets one at a time. Most teams find that the flattening asset can be deleted entirely once Decode is in place.

What permissions does my BigQuery connection need?

The standard BigQuery Data Viewer and BigQuery Job User roles, plus Storage Object Viewer on the Decode GA4 GCS bucket — required because the events table is an external table backed by Parquet files. Full prerequisites →

Can I use Bruin's quality checks against the decoded events?

Yes. The events table behaves like any other BigQuery source — Bruin's column-level checks for not-null, unique, accepted values, and custom SQL all work exactly as they would against a regular table.

Deploy in under 5 minutes

Your Bruin pipelines,
without the UNNEST tax.

Subscribe via Google Cloud Marketplace, point destination_dataset_id at your Bruin sources dataset, and have a clean events table available to your assets before the end of the day.

Get Started on Marketplace → Read the documentation

Google Cloud Marketplace · Usage-based · No monthly minimum