Google Analytics 4

Motivation

The Google Analytics 4 export to BigQuery has a complex structure which is extremely difficult to query in SQL, making analytics tasks time-consuming and SQL code extremely long, difficult to understand and error-prone.

Objective

The Google Analytics 4 (GA4) decoder simplifies and augments the GA4 BigQuery export, making subsequent data operations simpler and quicker.

It enables automatic pre-modelling of the GA4 BigQuery events_YYYYMMDD export, providing flattened, date-partitioned events and sessions tables containing all standard and observed event_params and user_properties. It also provides mechanisms to detect new event_params and user_properties values and to incrementally update the schema based on inbound data detection.

It can be installed using a one-line command in the BigQuery console by permitted users on registered datasets, and has no dependencies on any external platforms or API calls.

Access

The Google Analytics 4 Decoder is in pre-launch with selected agencies and enterprises. Apply for access here.

Quickstart

Follow these instructions to get started as quickly as possible.

Installation

The GA4 decoder can be deployed in the same dataset as the inbound GA4 dataset (ga4_dataset_id), or a different dataset if desired. Note that the installation function needs to be called in the same region in which the GA4 dataset is located (in the examples below, for the us multi-region).

Basic Installation

In the following installation examples, the event_options JSON variable is NULL, which means that default installation options are used. The event_options is the mechanism via which custom installation configurations are set.

Basic installation in the GA4 datasetBasic installation in a different datasetConcise syntax

  DECLARE ga4_dataset_id, decoder_dataset_id STRING;
  DECLARE event_options JSON DEFAULT JSON '{}';

  SET ga4_dataset_id = 'project_id.analytics_##########';
  SET decoder_dataset_id = ga4_dataset_id;

  CALL decodedata.us.install_ga4_decoder(ga4_dataset_id, decoder_dataset_id, event_options);

  DECLARE ga4_dataset_id, decoder_dataset_id STRING;
  DECLARE event_options JSON DEFAULT JSON '{}';

  SET ga4_dataset_id = 'project_id.analytics_##########';
  SET decoder_dataset_id = 'project_id.decoder_dataset_name';

  CALL decodedata.us.install_ga4_decoder(ga4_dataset_id, decoder_dataset_id, event_options);

CALL decodedata.us.install_ga4_decoder('project_id.analytics_##########', 'project_id.decoder_dataset_name' NULL);

Output Resources

Output Tables

Following installation, these output tables are built in the decoder_dataset_id dataset:

Resource Name	Resource Type	Partitioning Column	Row Granularity
`events`	`TABLE`	`event_date`	One row per event
`sessions`	`TABLE`	`event_date`	One row per session

Detailed information regarding the transformation, augmentation and schema is available in the events and sessions docs.

These date-partitioned tables will be built from the latest data upon installation.

Functions

Following installation, these execution functions are also deployed in the decoder_dataset_id dataset:

Resource Name	Resource Type	Arguments	Action
`INSTALL_DECODER`	`PROCEDURE`	none	Reinstall decoder with original deployment options
`RUN_DECODER`	`PROCEDURE`	`execution_options JSON`	Execute the decoder and refresh output tables

The decoder_dataset_id.INSTALL_DECODER() function is executed to reinstall the decoder with the original installation configuration. It also rebuilds output tables with the full historic data.

The decoder_dataset_id.RUN_DECODER(execution_options) function is executed to refresh output tables with the transformed latest arriving data.

Automation

The RUN_DECODER function has two different execution modes: incremental and full. Execution mode is set using the execution_options JSON arguments.

In order to efficiently automate the decoder, a simple BigQuery scheduled query can be used to call the RUN_DECODER function periodically (typically every hour). Since it is querying metadata to identify newly arriving date partitions, the underlying data is never queried, meaning that the compute required for the automation is predictable and low (see the Compute Estimate section).

Incremental Refresh

The incremental execution mode is the optimal mode to keep the output tables up-to-date in a cost-efficient manner. It compares inbound and outbound table metadata to determine when new date partitions arrive, and runs on those partitions only (plus an additional n days, due to late arriving data in the GA4 export). Note that since the source data can change up to 72 hours after the table is created, it is recommended to set this to 4 days.

Incremental refreshConcise syntax

DECLARE execution_options JSON DEFAULT JSON '{"mode": "incremental", "date_partitions_to_replace": 4}';

CALL `decoder_dataset_id.us.RUN_DECODER`(execution_options);

CALL `decoder_dataset_id.us.RUN_DECODER`(JSON '{"mode": "incremental", "date_partitions_to_replace": 4}');

Full Refresh

To execute a full refresh of output tables between the first observed source shard data and the CURRENT_DATE, use the following execution_options:

Full refreshConcise syntax

DECLARE execution_options JSON DEFAULT JSON '{"mode": "full"}';

CALL `decoder_dataset_id.us.RUN_DECODER`(execution_options);

CALL `decoder_dataset_id.us.RUN_DECODER`(JSON '{"mode": "full"}');

Default mode

If the execution_options JSON argument is NULL, the execution defaults to mode = incremental and date_partitions_to_replace = 4.

Default incremental cxecutionConcise syntax

DECLARE execution_options JSON DEFAULT NULL;

CALL `decoder_dataset_id.us.RUN_DECODER`(execution_options);

CALL `decoder_dataset_id.us.RUN_DECODER`(NULL);

Compute Estimate

The compute automation component to run the GA4 decoder hourly is approximately $0.135 per property per month. This consumption will not increase over time, regardless of inbound data volumes.

Component	Logic	Value	Unit
Execution Processing	3 x 10MB Metadata Queries	30	MB
Monthly Executions	30 days * 24hrs	720	Executions
Monthly Compute	720 Executions * 30MB	21.6	GB
Compute Unit Cost	1TB On-Demand Compute = $6.25	0.00625	$/GB

Total Automation Monthly Compute	*21.6GB $0.00625**	0.135	$