Unified Data Platform Guide
Marketing Data Warehouse & Multi-Touch Attribution
Most businesses run marketing with fragmented data — GA4 in one place, CRM in another, ad platforms each telling a different story. The result is blind budget allocation, false attribution, and zero cross-sell visibility. A unified data platform fixes this by connecting every source into one warehouse, enabling true multi-touch attribution and ML-powered audience activation.
The Problem: Fragmented Marketing Data
Why siloed data costs you revenue
Every ad platform tells its own story. Google Ads claims credit for conversions that Meta also claims. Your CRM shows different revenue than GA4. Nobody can answer the simple question: "Which channel actually drove this sale?"
Duplicate attribution
Google, Meta, and TikTok all claim the same conversion. Total reported ROAS is 3x actual.
No cross-sell visibility
You can't see that a customer who bought Product A also bought Product B two weeks later via a different channel.
Last-click blindness
Upper-funnel campaigns (display, video, social) look unprofitable because last-click gives all credit to branded search.
Manual reporting chaos
Teams spend days pulling data from 5 platforms into spreadsheets. Numbers never match. Decisions are delayed.
No customer journey
You see sessions, not people. You can't connect anonymous website visits to known CRM customers.
Zero ML capability
Without unified data, you can't build propensity models, LTV predictions, or activate audiences in ad platforms.
Medallion Architecture
Bronze → Silver → Gold data layers
We use a Medallion architecture in BigQuery — the same pattern used by Databricks and Netflix — to organize data from raw ingestion to business-ready analytics.
Bronze Layer — Raw Landing Zone
Data lands exactly as received from each source. No transformations. Full audit trail.
Silver Layer — Cleaned & Standardized
Deduplication, data type casting, schema alignment, null handling. Consistent naming conventions across all sources.
Gold Layer — Business Logic Marts
Attribution models, customer LTV, cross-sell metrics, campaign performance. Ready for dashboards and ML.
Data Sources & Pipelines
Connecting every data source to BigQuery
GA4 → BigQuery
Automatic export of all GA4 events including user properties, event parameters, and e-commerce data. No pipeline code needed — just enable in GA4 settings.
CRM / CDP → BigQuery
Custom Python pipeline pulling customer records, transactions, and segments. Supports Salesforce, HubSpot, Ascent360, Klaviyo, and custom CRMs. Incremental loads with deduplication.
Google Ads → BigQuery
Campaign, ad group, keyword, and conversion data. Cost, clicks, impressions, and conversion values at the granularity you need for attribution.
Meta Ads → BigQuery
Campaign, ad set, ad-level performance data. Spend, reach, frequency, and conversion data including attribution windows.
DV360 / SA360 → BigQuery
Display & Video 360 and Search Ads 360 data for enterprise advertisers. Native BigQuery Data Transfer Service available.
Shopify / E-Commerce → BigQuery
Orders, customers, products, and inventory data. Multi-store support with tenant isolation and multi-currency normalization.
Orchestration: Cloud Run + Cloud Scheduler
All pipelines run as containerized Python services on Cloud Run, triggered by Cloud Scheduler on configurable cron schedules. Cloud Logging provides monitoring and alerting. Retry logic handles transient API failures. Total infrastructure cost: typically $20-50/month.
Privacy-Safe Identity Resolution
Connecting website behaviour to CRM customers without exposing PII
The critical challenge: how do you join anonymous GA4 sessions with known CRM customers? Our approach uses SHA-256 hashed User IDs — the same cryptographic standard used by Google and Meta for their conversion APIs.
-- Identity resolution in BigQuery (Gold layer)
SELECT
ga4.hashed_user_id,
ga4.event_name,
ga4.event_date,
ga4.source_medium,
crm.customer_segment,
crm.lifetime_value,
crm.last_purchase_date
FROM silver.ga4_events ga4
LEFT JOIN silver.crm_customers crm
ON ga4.hashed_user_id = crm.hashed_user_id
WHERE ga4.hashed_user_id IS NOT NULL
Privacy-Safe
SHA-256 hashing with client-controlled salt. No raw PII in the warehouse.
GDPR / CCPA Compliant
Hashed IDs are pseudonymized data — satisfies privacy regulations when combined with proper consent.
Deterministic Matching
No probabilistic guessing. Same user = same hash. 100% match accuracy on logged-in users.
Multi-Touch Attribution
Moving beyond last-click to true customer journey attribution
With unified data, you can finally see the complete customer journey — from first ad impression through to purchase and repeat buy. This enables attribution models that give proper credit to every touchpoint.
Last-Click (default)
Low accuracyAll credit to the final touchpoint. Overvalues branded search, undervalues awareness.
First-Click
Low accuracyAll credit to the first touchpoint. Overvalues discovery channels, ignores nurture.
Linear
Medium accuracyEqual credit to every touchpoint. Simple but doesn't reflect reality.
Position-Based (U-shaped)
High accuracy40% to first touch, 40% to last touch, 20% distributed to middle. Good default.
Data-Driven (BigQuery ML)
Highest accuracyML model learns which touchpoints actually influence conversion for your business. Best accuracy.
Dashboards & Reporting
Looker Studio dashboards powered by Gold layer data
With clean, unified data in the Gold layer, dashboards become simple. No more pulling from 5 platforms. One source of truth, refreshed daily.
Campaign Performance
True ROAS by channel, cost per acquisition, attributed revenue, budget pacing
For: Marketing managers, media buyers
Customer Journey
Touchpoint sequences, time-to-conversion, channel interaction paths, assisted conversions
For: CMO, strategy team
Cross-Sell / Share of Wallet
Product affinity, bundle attach rates, upsell conversion rates, category penetration
For: Product marketing, merchandising
Geo & Market Analysis
Performance by region, market-level ROAS, geo-targeted campaign efficiency
For: Regional managers, expansion planning
ML & Audience Activation
BigQuery ML for propensity modeling and ad platform activation
The ultimate payoff of a unified data platform: using BigQuery ML to predict customer behaviour and push those audiences directly into ad platforms for smarter bidding.
Propensity Modeling
BigQuery ML logistic regression or XGBoost model trained on historical CRM + GA4 data. Scores every user as high / medium / low propensity to convert.
-- BigQuery ML propensity model
CREATE MODEL gold.propensity_model
OPTIONS(model_type='LOGISTIC_REG') AS
SELECT sessions, pages_viewed, days_since_last_visit,
channel_grouping, device_category, converted
FROM gold.user_features
Audience Activation Loop
High-propensity segments are exported from BigQuery → uploaded to Google Ads as Customer Match audiences → used for Smart Bidding bid adjustments. Close the loop between data and ad spend.
ROI Calculator
Estimate the value of unifying your marketing data
This calculator estimates savings from eliminating duplicate attribution (reallocating misspent budget) and reducing manual reporting time. It does not assume more conversions — GA4 only tracks click-through. The value comes from spending smarter, not spending more.
Real Implementation
Ski Resort Group — Unified Data Platform
A multi-resort ski group running Google Ads, Meta, and DV360 campaigns had no way to connect ad spend to actual bookings. Every platform claimed credit for the same conversions. Zero cross-sell visibility across lift tickets, rentals, ski school, and lodging.
We built a full unified data platform in BigQuery integrating GA4, CRM (Ascent360), and all three ad platforms. The key deliverables: deduplicated conversion path journeys by source/medium showing which campaigns drive awareness vs. which actually convert. User value segmentation (high/medium/low) by joining CRM lifetime value with GA4 behaviour. Churn-risk user lists combining CRM purchase gaps with GA4 drop-off signals. Looker Studio dashboards for campaign performance, geo reporting, and cross-sell analysis. Server-side Enhanced Conversions + Meta CAPI for improved signal quality.
Frequently Asked Questions
Ready to Unify Your Marketing Data?
Book a free data engineering consultation. We'll assess your current data stack, identify integration opportunities, and outline a custom architecture for your business.