Transform
Overview
The transform configuration is automatically generated as part of the installation process, and therefore does not need any additional input in order to run Decode GA4. This configuration is deployed as the function transform_configs()
in the deployment_dataset_id
dataset for reference and subsequent update and usage.
The transform_configs
JSON
array contains the transformation-level configurations, as well as information regarding the source and destination. You should not need to update this configuration unless you want to change any of the transformation configuration values.
External Table Configuration
This will process all identified dates, and export them to project, dataset, transform and date-segregated sub-folders in the specified GCS bucket, with an external table deployed in order to access the partitioned data in BigQuery.
[
{
"automation": {
"run_mode": "incremental",
"window_days": 3
},
"config": {
"transform_config_template": "events_external",
"transform_function": "events",
"transform_id": "events",
"transform_index": 0
},
"destination": {
"compression": "GZIP",
"dataset_id": "project_id.deployment_dataset_id",
"description": "base events external table, one row is a single event",
"format": "PARQUET",
"gcs_bucket_name": "ugg-data/ga4",
"granularity": [
"event_id"
],
"hive_partition_column_name": "partition_date",
"hive_partitioned": true,
"table_name": "events",
"table_type": "external"
},
"source": {
"dataset_id": "project_id.ga4_dataset_id",
"source_data_name": "source_data",
"table_prefix": "events_",
"table_type": "date_sharded"
}
}
]
Partitioned Table Configuration
This will export all data to a date-partitioned table in the specified deployment dataset.
[
{
"automation": {
"run_mode": "incremental",
"window_days": 3
},
"config": {
"transform_index": 0,
"transform_config_template": "events_partitioned",
"transform_id": "events",
"transform_function": "events"
},
"destination": {
"dataset_id": "project_id.deployment_dataset_id",
"table_type": "date_partitioned",
"table_name": "events",
"description": "base events date-partitioned table, one row is a single event",
"granularity": ["event_id"],
"partition_expression": "event_date",
"clustering_column_list": ["event_name"]
},
"source": {
"dataset_id": "project_id.ga4_dataset_id",
"table_type": "date_sharded",
"table_prefix": "events_",
"source_id": "source_data"
}
}
]
Run Modes
Updating the transform_configs
run_mode
before executing the RUN
function gives fine-grained control over transformation behaviour:
RUN MODE | REFRESH DESCRIPTION |
---|---|
full | All date partitions are refreshed. |
incremental | Only new date partitions, plus any modified date partitions within the past n days determined by automation.window_days . |
test | Only the first n date partitions, determined by the automation.window_days in each transform configuration. |
Extension
Additional transformations can be included in the configuration by simply adding the configuration to the transform_configs
JSON
array, with the transformation logic defined in a date-bounded table-valued function (with start_date
and end_date
as the required arguments). This enables subsequent transformations to be executed and orchestrated without usage of any additional 3rd party tool.