Unified Data Platform Guide

Marketing Data Warehouse & Multi-Touch Attribution

Most businesses run marketing with fragmented data — GA4 in one place, CRM in another, ad platforms each telling a different story. The result is blind budget allocation, false attribution, and zero cross-sell visibility. A unified data platform fixes this by connecting every source into one warehouse, enabling true multi-touch attribution and ML-powered audience activation.

~20%
Spend Reallocation
Ad budget shifted from duplicate-credited to undervalued channels
6+
Sources Unified
GA4, CRM, Google Ads, Meta, DV360, more
100%
Data Ownership
Your BigQuery, your infrastructure, no lock-in
ML
Ready
BigQuery ML for propensity & audience activation

The Problem: Fragmented Marketing Data

Why siloed data costs you revenue

Every ad platform tells its own story. Google Ads claims credit for conversions that Meta also claims. Your CRM shows different revenue than GA4. Nobody can answer the simple question: "Which channel actually drove this sale?"

Duplicate attribution

Google, Meta, and TikTok all claim the same conversion. Total reported ROAS is 3x actual.

No cross-sell visibility

You can't see that a customer who bought Product A also bought Product B two weeks later via a different channel.

Last-click blindness

Upper-funnel campaigns (display, video, social) look unprofitable because last-click gives all credit to branded search.

Manual reporting chaos

Teams spend days pulling data from 5 platforms into spreadsheets. Numbers never match. Decisions are delayed.

No customer journey

You see sessions, not people. You can't connect anonymous website visits to known CRM customers.

Zero ML capability

Without unified data, you can't build propensity models, LTV predictions, or activate audiences in ad platforms.

Medallion Architecture

Bronze → Silver → Gold data layers

We use a Medallion architecture in BigQuery — the same pattern used by Databricks and Netflix — to organize data from raw ingestion to business-ready analytics.

Bronze Layer — Raw Landing Zone

Data lands exactly as received from each source. No transformations. Full audit trail.

GA4 BigQuery ExportCRM SFTP DumpsGoogle Ads APIMeta Marketing APIDV360 Reports

Silver Layer — Cleaned & Standardized

Deduplication, data type casting, schema alignment, null handling. Consistent naming conventions across all sources.

DeduplicationType CastingSchema AlignmentIncremental MERGEPartitioned Tables

Gold Layer — Business Logic Marts

Attribution models, customer LTV, cross-sell metrics, campaign performance. Ready for dashboards and ML.

Attribution ModelCustomer LTVCross-Sell MartCampaign PerformanceAudience Segments

Data Sources & Pipelines

Connecting every data source to BigQuery

GA4 → BigQuery

Native BigQuery Export (free)Daily (streaming on 360)

Automatic export of all GA4 events including user properties, event parameters, and e-commerce data. No pipeline code needed — just enable in GA4 settings.

CRM / CDP → BigQuery

API / SFTP Connector on Cloud RunDaily incremental

Custom Python pipeline pulling customer records, transactions, and segments. Supports Salesforce, HubSpot, Ascent360, Klaviyo, and custom CRMs. Incremental loads with deduplication.

Google Ads → BigQuery

Google Ads API via Cloud RunDaily

Campaign, ad group, keyword, and conversion data. Cost, clicks, impressions, and conversion values at the granularity you need for attribution.

Meta Ads → BigQuery

Meta Marketing API via Cloud RunDaily

Campaign, ad set, ad-level performance data. Spend, reach, frequency, and conversion data including attribution windows.

DV360 / SA360 → BigQuery

Native BigQuery Transfer or APIDaily

Display & Video 360 and Search Ads 360 data for enterprise advertisers. Native BigQuery Data Transfer Service available.

Shopify / E-Commerce → BigQuery

Shopify API via Cloud RunDaily incremental

Orders, customers, products, and inventory data. Multi-store support with tenant isolation and multi-currency normalization.

Orchestration: Cloud Run + Cloud Scheduler

All pipelines run as containerized Python services on Cloud Run, triggered by Cloud Scheduler on configurable cron schedules. Cloud Logging provides monitoring and alerting. Retry logic handles transient API failures. Total infrastructure cost: typically $20-50/month.

Privacy-Safe Identity Resolution

Connecting website behaviour to CRM customers without exposing PII

The critical challenge: how do you join anonymous GA4 sessions with known CRM customers? Our approach uses SHA-256 hashed User IDs — the same cryptographic standard used by Google and Meta for their conversion APIs.

-- Identity resolution in BigQuery (Gold layer)

SELECT

ga4.hashed_user_id,

ga4.event_name,

ga4.event_date,

ga4.source_medium,

crm.customer_segment,

crm.lifetime_value,

crm.last_purchase_date

FROM silver.ga4_events ga4

LEFT JOIN silver.crm_customers crm

ON ga4.hashed_user_id = crm.hashed_user_id

WHERE ga4.hashed_user_id IS NOT NULL

Privacy-Safe

SHA-256 hashing with client-controlled salt. No raw PII in the warehouse.

GDPR / CCPA Compliant

Hashed IDs are pseudonymized data — satisfies privacy regulations when combined with proper consent.

Deterministic Matching

No probabilistic guessing. Same user = same hash. 100% match accuracy on logged-in users.

Multi-Touch Attribution

Moving beyond last-click to true customer journey attribution

With unified data, you can finally see the complete customer journey — from first ad impression through to purchase and repeat buy. This enables attribution models that give proper credit to every touchpoint.

Last-Click (default)

Low accuracy

All credit to the final touchpoint. Overvalues branded search, undervalues awareness.

First-Click

Low accuracy

All credit to the first touchpoint. Overvalues discovery channels, ignores nurture.

Linear

Medium accuracy

Equal credit to every touchpoint. Simple but doesn't reflect reality.

Position-Based (U-shaped)

High accuracy

40% to first touch, 40% to last touch, 20% distributed to middle. Good default.

Data-Driven (BigQuery ML)

Highest accuracy

ML model learns which touchpoints actually influence conversion for your business. Best accuracy.

Dashboards & Reporting

Looker Studio dashboards powered by Gold layer data

With clean, unified data in the Gold layer, dashboards become simple. No more pulling from 5 platforms. One source of truth, refreshed daily.

Campaign Performance

True ROAS by channel, cost per acquisition, attributed revenue, budget pacing

For: Marketing managers, media buyers

Customer Journey

Touchpoint sequences, time-to-conversion, channel interaction paths, assisted conversions

For: CMO, strategy team

Cross-Sell / Share of Wallet

Product affinity, bundle attach rates, upsell conversion rates, category penetration

For: Product marketing, merchandising

Geo & Market Analysis

Performance by region, market-level ROAS, geo-targeted campaign efficiency

For: Regional managers, expansion planning

ML & Audience Activation

BigQuery ML for propensity modeling and ad platform activation

The ultimate payoff of a unified data platform: using BigQuery ML to predict customer behaviour and push those audiences directly into ad platforms for smarter bidding.

Propensity Modeling

BigQuery ML logistic regression or XGBoost model trained on historical CRM + GA4 data. Scores every user as high / medium / low propensity to convert.

-- BigQuery ML propensity model

CREATE MODEL gold.propensity_model

OPTIONS(model_type='LOGISTIC_REG') AS

SELECT sessions, pages_viewed, days_since_last_visit,

channel_grouping, device_category, converted

FROM gold.user_features

Audience Activation Loop

High-propensity segments are exported from BigQuery → uploaded to Google Ads as Customer Match audiences → used for Smart Bidding bid adjustments. Close the loop between data and ad spend.

ROI Calculator

Estimate the value of unifying your marketing data

This calculator estimates savings from eliminating duplicate attribution (reallocating misspent budget) and reducing manual reporting time. It does not assume more conversions — GA4 only tracks click-through. The value comes from spending smarter, not spending more.

$5,000
Monthly savings from smarter allocation
$60,000
Annual ad spend reallocated
$28,800
Annual reporting hours saved
$88,800
Total annual benefit

Real Implementation

Ski Resort Group — Unified Data Platform

6+
Data sources unified
Deduped
Conversion path journeys
H/M/L
User value segmentation
Churn
At-risk user identification

A multi-resort ski group running Google Ads, Meta, and DV360 campaigns had no way to connect ad spend to actual bookings. Every platform claimed credit for the same conversions. Zero cross-sell visibility across lift tickets, rentals, ski school, and lodging.

We built a full unified data platform in BigQuery integrating GA4, CRM (Ascent360), and all three ad platforms. The key deliverables: deduplicated conversion path journeys by source/medium showing which campaigns drive awareness vs. which actually convert. User value segmentation (high/medium/low) by joining CRM lifetime value with GA4 behaviour. Churn-risk user lists combining CRM purchase gaps with GA4 drop-off signals. Looker Studio dashboards for campaign performance, geo reporting, and cross-sell analysis. Server-side Enhanced Conversions + Meta CAPI for improved signal quality.

BigQueryCloud RunGoogle Ads APIMeta Ads APIDV360Ascent360 CRMLooker StudioBigQuery MLServer-Side GTMMeta CAPI

Frequently Asked Questions

Ready to Unify Your Marketing Data?

Book a free data engineering consultation. We'll assess your current data stack, identify integration opportunities, and outline a custom architecture for your business.