Transform

Overview

The transform configuration is automatically generated as part of the installation process, and therefore does not need any additional input in order to run Decode GA4. This configuration is deployed as the function transform_configs() in the deployment_dataset_id dataset for reference and subsequent update and usage.

The transform_configs JSON array contains the transformation-level configurations, as well as information regarding the source and destination. You should not need to update this configuration unless you want to change any of the transformation configuration values.

External Table Configuration

This will process all identified dates, and export them to project, dataset, transform and date-segregated sub-folders in the specified GCS bucket, with an external table deployed in order to access the partitioned data in BigQuery.

[
  {
    "automation": {
      "run_mode": "incremental",
      "window_days": 3
    },
    "config": {
      "transform_config_template": "events_external",
      "transform_function": "events",
      "transform_id": "events",
      "transform_index": 0
    },
    "destination": {
      "compression": "GZIP",
      "dataset_id": "project_id.deployment_dataset_id",
      "description": "base events external table, one row is a single event",
      "format": "PARQUET",
      "gcs_bucket_name": "ugg-data/ga4",
      "granularity": [
        "event_id"
      ],
      "hive_partition_column_name": "partition_date",
      "hive_partitioned": true,
      "table_name": "events",
      "table_type": "external"
    },
    "source": {
      "dataset_id": "project_id.ga4_dataset_id",
      "source_data_name": "source_data",
      "table_prefix": "events_",
      "table_type": "date_sharded"
    }
  }
]

Partitioned Table Configuration

This will export all data to a date-partitioned table in the specified deployment dataset.

[
    {
    "automation": {
        "run_mode": "incremental",
        "window_days": 3
        },
    "config": {
        "transform_index": 0,
        "transform_config_template": "events_partitioned",
        "transform_id": "events",
        "transform_function": "events"
        },
    "destination": {
        "dataset_id": "project_id.deployment_dataset_id",
        "table_type": "date_partitioned",
        "table_name": "events",
        "description": "base events date-partitioned table, one row is a single event",
        "granularity": ["event_id"],
        "partition_expression": "event_date",
        "clustering_column_list": ["event_name"]
        },
    "source": {
        "dataset_id": "project_id.ga4_dataset_id",
        "table_type": "date_sharded",
        "table_prefix": "events_",
        "source_id": "source_data"
        }
    }
]

Run Modes

Updating the transform_configs run_mode before executing the RUN function gives fine-grained control over transformation behaviour:

RUN MODE REFRESH DESCRIPTION
full All date partitions are refreshed.
incremental Only new date partitions, plus any modified date partitions within the past n days determined by automation.window_days.
test Only the first n date partitions, determined by the automation.window_days in each transform configuration.

Extension

Additional transformations can be included in the configuration by simply adding the configuration to the transform_configs JSON array, with the transformation logic defined in a date-bounded table-valued function (with start_date and end_date as the required arguments). This enables subsequent transformations to be executed and orchestrated without usage of any additional 3rd party tool.