Quickstart
Access
Decode GA4 is distributed via BigQuery Analytics Hub, available in all regions and multi-regions. Currently access is only on a private-offer basis, with billing via your existing linked billing account on a flat rate per TiB processed basis.
Contact ga4@decodedata.io
to request private offer terms.
Pre-requisites
In order to deploy Decode GA4, you first need to create a Google Cloud Storage (GCS) bucket. This will be the storage location for compressed, transformed events data. Creating a new bucket can be achieved through the Google Cloud user interface, command line or client libraries, and the executing user or service account must have storage.objects.create
permissions on the bucket.
Installation
The minimum configuration required is simply the location of your inbound GA4 BigQuery dataset (ga4_dataset_id
) and the name of the GCS bucket (gcs_bucket_name
) to be used for transformed base events data.
Configuration
The following options are required at a minimum in order to
NAME | VALUE | Details |
---|---|---|
ga4_dataset_id | STRING | The ID of the dataset containing your BigQuery export, in the format project_id.analytics_XXXXXXX . |
gcs_bucket_name | STRING | The name of the Google Cloud Storage bucket in which to store compressed, transformed data. |
Deployment
Decode GA4 is installed by copying the following script to your BigQuery console, updating the configuration and running the updated script.
Note that all configuration options
are set using the options
JSON variable.
DECLARE options JSON;
DECLARE deployment_script STRING;
SET options = JSON """
{
"ga4_dataset_id": "project_id.dataset_id",
"gcs_bucket_name": "your_gcs_bucket_name"
}
""";
SET deployment_script = (
SELECT * FROM `decode-ga4.eu.generate_deployment_script`(options));
EXECUTE IMMEDIATE (deployment_script);
Execution
This will create the BigQuery dataset decode_analytics_XXXXXXX
, which will contain the stored procedure deploy_decode_ga4()
, including the deployment options. Run this deployment function using:
CALL `project_id.decode_analytics_XXXXXXX.deploy_decode_ga4()`
This will result in the following resources being deployed in your destination dataset and bucket.
Outputs
BigQuery Dataset
The decode_analytics_XXXXXXX
dataset will be created, containing configuration, functions, logs and the base events
external table. Note that the destination dataset can be overridden by setting decode_ga4_dataset_id
in the configuration options
JSON.
Google Cloud Storage Bucket Folders
The /decode-ga4/events
folder hierarchy will be created in the specified GCS bucket, with separate date-segregated subfolders containing compressed, transformed based events data. The file structure is as follows.
decode-ga4
└── events
├── partition_date=2025-02-01
│ ├── events_0000000000.gzip
│ ├── events_0000000001.gzip
│ └── events_0000000002.gzip
├── partition_date=2025-02-02
│ ├── events_0000000000.gzip
│ ├── events_0000000001.gzip
│ ├── events_0000000002.gzip
│ └── events_0000000003.gzip
└── ...
BigQuery Table
The base events
BigQuery external table will be created, which is hive-partitioned to access enable efficient access to the compressed data in GCS, using partition_date
as the partitioning column.
Execution Function
The run_decode_ga4
function will be deployed in your destination decode_ga4_dataset_id
dataset, containing the specific deployment options defined upon initial installation. Upon invocation, this executes an incremental update on the transformed event data. This can be invoked manually, on a schedule or on an event-driven basis by executing:
CALL `decode_ga4_dataset_id.run_decode_ga4()`
Note that if this is invoked and there is no new data detected, only negligible metadata query costs are incurred.