Quickstart

Access

Decode GA4 is distributed via BigQuery Analytics Hub, available in all regions and multi-regions. Currently access is only on a private-offer basis, with billing via your existing linked billing account on a flat rate per TiB processed basis.

Contact ga4@decodedata.io to request private offer terms.

Pre-requisites

In order to deploy Decode GA4, you first need to create a Google Cloud Storage (GCS) bucket. This will be the storage location for compressed, transformed events data. Creating a new bucket can be achieved through the Google Cloud user interface, command line or client libraries, and the executing user or service account must have storage.objects.create permissions on the bucket.

Installation

The minimum configuration required is simply the location of your inbound GA4 BigQuery dataset (ga4_dataset_id) and the name of the GCS bucket (gcs_bucket_name) to be used for transformed base events data.

Configuration

The following options are required at a minimum in order to

NAME VALUE Details
ga4_dataset_id STRING The ID of the dataset containing your BigQuery export, in the format project_id.analytics_XXXXXXX.
gcs_bucket_name STRING The name of the Google Cloud Storage bucket in which to store compressed, transformed data.

Deployment

Decode GA4 is installed by copying the following script to your BigQuery console, updating the configuration and running the updated script.

Note that all configuration options are set using the options JSON variable.

DECLARE options JSON;
DECLARE deployment_script STRING;

SET options = JSON """
    {
        "ga4_dataset_id": "project_id.dataset_id",
        "gcs_bucket_name": "your_gcs_bucket_name"
    }
    """;

SET deployment_script = (
    SELECT * FROM `decode-ga4.eu.generate_deployment_script`(options));
EXECUTE IMMEDIATE (deployment_script);

Execution

This will create the BigQuery dataset decode_analytics_XXXXXXX, which will contain the stored procedure deploy_decode_ga4(), including the deployment options. Run this deployment function using:

CALL `project_id.decode_analytics_XXXXXXX.deploy_decode_ga4()`

This will result in the following resources being deployed in your destination dataset and bucket.

Outputs

BigQuery Dataset

The decode_analytics_XXXXXXX dataset will be created, containing configuration, functions, logs and the base events external table. Note that the destination dataset can be overridden by setting decode_ga4_dataset_id in the configuration options JSON.

Google Cloud Storage Bucket Folders

The /decode-ga4/events folder hierarchy will be created in the specified GCS bucket, with separate date-segregated subfolders containing compressed, transformed based events data. The file structure is as follows.

decode-ga4
└── events
    ├── partition_date=2025-02-01
    │   ├── events_0000000000.gzip
    │   ├── events_0000000001.gzip
    │   └── events_0000000002.gzip
    ├── partition_date=2025-02-02
    │   ├── events_0000000000.gzip
    │   ├── events_0000000001.gzip
    │   ├── events_0000000002.gzip
    │   └── events_0000000003.gzip
    └── ...

BigQuery Table

The base events BigQuery external table will be created, which is hive-partitioned to access enable efficient access to the compressed data in GCS, using partition_date as the partitioning column.

Execution Function

The run_decode_ga4 function will be deployed in your destination decode_ga4_dataset_id dataset, containing the specific deployment options defined upon initial installation. Upon invocation, this executes an incremental update on the transformed event data. This can be invoked manually, on a schedule or on an event-driven basis by executing:

CALL `decode_ga4_dataset_id.run_decode_ga4()` 

Note that if this is invoked and there is no new data detected, only negligible metadata query costs are incurred.