Events
Overview
The events
table is a flattened transformation of the raw inbound events_YYYYMMDD
table, with the following structural and semantic changes:
- Simplified flat output structure for optimal query syntax
- Simple event count metrics for comparison and calculations
- Standard and custom event parameters in property-specific flat schema
- Custom user properties in property-specific flat schema
- Geo-based local timestamps for time-of-day analyses
- Observation-based data type coalescence to prevent type-based historic data loss
- Session and event ids to support robust data quality assurance processes
One row represents one single discrete event.
Schema
New Columns (Metadata)
The following metadata columns are added to the output events
table.
COLUMN NAME | DATA TYPE | DESCRIPTION |
---|---|---|
project_id | STRING | project_id containing the ga4_dataset_id |
dataset_name | STRING | dataset_name of the GA4 dataset |
analytics_property_id | STRING | analytics_property_id of the GA4 dataset |
event_id | STRING | Unique ID per event |
session_id | STRING | Unique ID per session |
Existing Columns
The following columns are passed through directly from the raw events_YYYYMMDD
, with no changes to structure or data type:
COLUMN NAME | DATA TYPE |
---|---|
event_name | STRING |
event_value_in_usd | FLOAT |
event_bundle_sequence_id | INTEGER |
event_server_timestamp_offset | INTEGER |
user_id | STRING |
user_pseudo_id | STRING |
privacy_info | STRUCT |
user_ltv | STRUCT |
device | STRUCT |
geo | STRUCT |
app_info | STRUCT |
traffic_source | STRUCT |
stream_id | STRING |
platform | STRING |
event_dimensions | STRUCT |
collected_traffic_source | STRUCT |
is_active_user | BOOLEAN |
privacy_info | STRING |
user_ltv | FLOAT |
device | STRING |
geo | STRING |
app_info | STRING |
traffic_source | STRING |
event_dimensions | STRING |
collected_traffic_source | STRING |
Data Type Changes
The following columns are included in the events
output table, with appropriate data type changes.
COLUMN NAME | FROM DATA TYPE | TO DATA TYPE | DESCRIPTION |
---|---|---|---|
event_date | STRING | DATE | Inbound shard date |
event_timestamp | INTEGER | TIMESTAMP | UTC timestamp of event occurrence |
event_previous_timestamp | INTEGER | TIMESTAMP | UTC timestamp of previous event occurrence |
user_first_touch_timestamp | INTEGER | TIMESTAMP | UTC timestamp of user first touch |
New Columns (Augmentation)
The following augmentation columns are added to the output events
table:
COLUMN NAME | DATA TYPE | DESCRIPTION |
---|---|---|
local | STRUCT | NEW local timezone and geolocation STRUCT |
local.timezone_id | STRING | NEW timezone id |
local.timezone_name | STRING | NEW timezone name |
local.country_code | STRING | NEW ISO country code |
local.timezone_source | STRING | NEW source of timezone allocation (city/region/country) |
local.latitude | STRING | NEW latitude |
local.longitude | STRING | NEW longitude |
local.date | DATE | NEW geo-adjusted event_date |
local.timestamp | DATETIME | NEW geo-adjusted event_timestamp |
local.time | TIME | NEW geo-adjusted time |
local.hour | INTEGER | NEW geo-adjusted hour |
local.hour_decimal | FLOAT | NEW geo-adjusted decimal hour |
local.previous_timestamp | DATETIME | NEW geo-adjusted previous_timestamp |
local.first_touch_timestamp | DATETIME | NEW geo-adjusted first_touch_timestamp |
New Columns (Restructuring)
The following flat STRUCT
columns are added, replacing the nested source columns which are excluded from the output events
table. The event_name
column is still included in the output events
table, but is supplemented with the count
metrics STRUCT
.
SOURCE COLUMN NAME | SOURCE DATA TYPE | TRANSFORMATION |
---|---|---|
event_name | STRING | Transformed to count metric STRUCT |
event_params | ARRAY | Transformed to parameter STRUCT |
user_properties | ARRAY | Transformed to property STRUCT |