Google Analytics 4
Objective
The Google Analytics 4 (GA4) decoder enables automatic pre-modelling of the GA4 BigQuery events_YYYYMMDD
export, providing flattened, date-partitioned events
and sessions
tables containing all standard and observed event_params
and user_properties
. It also provides mechanisms to detect new event_params
and user_properties
values and to incrementally update data based on inbound data detection.
It can be installed by permitted users on registered datasets by executing a single BigQuery function, without dependencies on any external platforms or API calls.
Access
The Google Analytics 4 decoder is currently open to private alpha registration. Apply for access here.
Installation
The GA4 decoder can be deployed in the same dataset as the inbound GA4 dataset (ga4_dataset_id
), or a different dataset if desired. Note that the installation function needs to be called in the same region in which the GA4 dataset is located (in the examples below, for the us
multi-region).
Basic Installation
In the following installation examples, the event_options
JSON
variable is NULL
, which means that default installation options are used. This object is the mechanism via which custom installation configurations are set.
--DECLARATION
DECLARE ga4_dataset_id, decoder_dataset_id STRING;
DECLARE event_options JSON DEFAULT JSON '{}';
--CONFIGURATION
SET ga4_dataset_id = 'project_id.analytics_##########';
SET decoder_dataset_id = ga4_dataset_id;
--EXECUTION
CALL decodedata.us.install_ga4_decoder(ga4_dataset_id, decoder_dataset_id, event_options);
--DECLARATION
DECLARE ga4_dataset_id, decoder_dataset_id STRING;
DECLARE event_options JSON DEFAULT JSON '{}';
--CONFIGURATION
SET ga4_dataset_id = 'project_id.analytics_##########';
SET decoder_dataset_id = 'another_project_id.analytics_##########';
--EXECUTION
CALL decodedata.us.install_ga4_decoder(ga4_dataset_id, decoder_dataset_id, event_options);
CALL decodedata.us.install_ga4_decoder('project_id.analytics_##########', 'another_project_id.analytics_##########', NULL);
Output Resources
Following default installation, these output tables are built in the decoder_dataset_id
dataset:
Resource Name | Resource Type | Partitioning Column | Row Granularity |
---|---|---|---|
events |
TABLE |
event_date |
One row per event |
sessions |
TABLE |
event_date |
One row per session |
These will be built from the latest data upon installation, but you will need to execute the decoder_dataset_id.RUN_DECODER(start_date, end_date)
function to refresh output table with the transformed latest arriving data.
Automation
In order to efficiently automate the decoder, a simple BigQuery scheduled query can be used to call the RUN_DECODER
function periodically (typically every hour). Since it is querying metadata to identify newly arriving date partitions, the underlying data is never queried, meaning that the compute required for the automation is predictable and low (see the Compute Estimate section).
Since Google indicates that inbound data can change up to 72 hours after arrival, we typically refresh the past 4 days of data on each identified arrival of new data.
The scheduled query required to execute this is then simply:
CALL `decoder_dataset_id.us.RUN_DECODER`(CURRENT_DATE - 4, NULL);
SET @@query_label = "scheduled_query_id:decoder_dataset_id";
CALL `decoder_dataset_id.us.RUN_FLOW`(CURRENT_DATE - 4, NULL)
Note that it is good practice to add a unique scheduled_query_id
query label to the scheduled query. This will enable job-based cost tracking across your GA4 properties to support robust cost management processes.
Compute Estimate
The compute automation component to run the GA4 decoder hourly is approximately $0.135 per property per month. This consumption will not increase over time, regardless of inbound data volumes.
Component | Logic | Value | Unit |
---|---|---|---|
Execution Processing | 3 x 10MB Metadata Queries | 30 | MB |
Monthly Executions | 30 days * 24hrs | 720 | Executions |
Monthly Compute | 720 Executions * 30MB | 21.6 | GB |
Compute Unit Cost | 1TB On-Demand Compute = $6.25 | 0.00625 | $/GB |
Total Automation Monthly Compute |
21.6GB * $0.00625 | 0.135 | $ |