Unlocking True ROAS: Combining Server-Side GA4, Google Ads & Facebook Ads in BigQuery for Custom Attribution
Unlocking True ROAS: Combining Server-Side GA4, Google Ads & Facebook Ads in BigQuery for Custom Attribution
You've invested heavily in building a robust server-side Google Analytics 4 (GA4) pipeline, leveraging Google Tag Manager (GTM) Server Container on Cloud Run for advanced data collection, transformations, and granular consent management. This architecture provides unparalleled control and data quality, forming the backbone of your modern analytics strategy. You're effectively capturing online interactions, enriching events, and perhaps even unifying your raw data with GA4 exports in BigQuery.
However, a persistent and critical challenge for data-driven marketing teams is achieving a truly unified view of marketing performance and accurately calculating Return On Ad Spend (ROAS) across disparate advertising platforms. Relying solely on the attribution models and reporting interfaces provided by Google Ads, Facebook Ads, or even GA4, often leads to:
- Fragmented Data: Cost data lives in ad platforms, conversion data in analytics, and blending them manually is time-consuming and error-prone.
- Inconsistent Attribution: Each platform has its own default attribution model (e.g., last-click for Google Ads, 1-day view/7-day click for Facebook), leading to discrepancies and making it impossible to compare apples to apples.
- Limited Customization: You can't apply custom, business-specific attribution models (e.g., a blended model that prioritizes certain touchpoints) directly within platform UIs.
- Elusive True ROAS: Without a consolidated view of costs and revenue attributed under a consistent model, calculating accurate, blended ROAS across all channels is a guessing game.
The problem, then, is the need for a centralized, extensible platform to combine server-side GA4 conversion data with advertising cost data, enabling custom attribution modeling and precise ROAS calculations that reflect your business's unique understanding of customer value.
The Challenge: Beyond Platform Silos
Imagine a customer journey where a user sees a Facebook ad, clicks a Google Search ad, browses your site, and then converts.
- Facebook attributes the conversion to its ad.
- Google Ads attributes it to its ad.
- GA4 (depending on its default model) attributes it to the last non-direct channel.
Each platform claims credit, and none provide the holistic picture needed for optimal budget allocation. You need to pull all the pieces together into your own data warehouse, BigQuery, where you have the flexibility to define your own rules.
Our Solution: A BigQuery-Centric Marketing Performance Hub
Our solution leverages Google Cloud's powerful data capabilities to build a unified marketing performance data model directly in BigQuery. This hub will ingest GA4 conversion data (from your existing BigQuery export), pull in cost and impression data from Google Ads and Facebook Ads, and then enable you to apply custom attribution logic and calculate blended ROAS.
This approach ensures:
- Unified Data Source: A single, comprehensive dataset for all your marketing performance analysis.
- Custom Attribution: Apply any attribution model you define in SQL, tailored to your business needs.
- Accurate ROAS: Calculate true Return On Ad Spend by blending costs and revenues consistently.
- Data Ownership: You own and control your core marketing performance data.
- Flexibility: Easily integrate new ad platforms or refine attribution models as your business evolves.
- Actionable Insights: Drive smarter budget allocation and campaign optimization decisions.
Architecture: Blending GA4 Conversions with Ad Costs in BigQuery
The core idea is to bring all relevant marketing and analytics data into BigQuery and use powerful SQL to stitch them together and apply custom logic.
graph TD
subgraph Data Sources
A[Client-Side Events] -- HTTP Request (to GTM SC) --> B(GTM Server Container on Cloud Run);
B -- Dispatch to GA4 MP (with gclid, fbclid) --> C[Google Analytics 4];
C -- Daily/Near Real-time Export --> D[BigQuery GA4 Export (events_*)];
E[Google Ads Account] -- Daily Sync --> F(BigQuery Data Transfer Service: Google Ads);
F --> G[BigQuery: Google Ads Cost Data];
H[Facebook Ads Account] -- API Call (from Cloud Run) --> I(Facebook Marketing API);
I --> J(Cloud Run: Facebook Ads ETL) -- Schedule (Cloud Scheduler) --> J;
J --> K[BigQuery: Facebook Ads Cost Data];
end
subgraph BigQuery Marketing Performance Hub
D -- SQL Join (on gclid, fbclid, transaction_id) --> L(BigQuery: Unified Conversions & Costs);
G -- SQL Join --> L;
K -- SQL Join --> L;
end
subgraph Analysis & Activation
L -- Custom Attribution Logic (SQL) --> M[BigQuery: Attributed Conversions];
M -- Calculate ROAS (SQL) --> N[BigQuery: ROAS Metrics];
N --> O[Looker Studio (Dashboards)];
N --> P[Custom BI Tools/Data Science];
end
Key Flow:
- GA4 Data Ingestion: Your existing server-side GA4 pipeline sends conversion events to GA4. Crucially, your GTM Server Container ensures
gclid(for Google Ads clicks) andfbclid(for Facebook Ads clicks) are captured and sent asevent_parametersto GA4. GA4 then exports this data to BigQuery. - Google Ads Cost Data: BigQuery Data Transfer Service (BQ DTS) automatically imports daily cost, impression, and click data from your Google Ads accounts into BigQuery.
- Facebook Ads Cost Data: A scheduled Cloud Run service periodically calls the Facebook Marketing API to fetch campaign-level cost and impression data, then loads it into BigQuery.
- Unified Data Model: SQL queries in BigQuery join these datasets on common identifiers (
gclid,fbclid,transaction_id,event_timestamp) to create a comprehensive table of conversions and their associated costs. - Custom Attribution: Apply SQL-based attribution models (e.g., first-click, last-click, linear, custom weighted) to distribute credit for conversions across various marketing touchpoints.
- ROAS Calculation: Calculate ROAS metrics based on your attributed revenue and actual ad costs.
- Reporting: Visualize the results in Looker Studio or other BI tools for actionable insights.
Core Components Deep Dive & Implementation Steps
1. Server-Side GA4: Capturing gclid and fbclid
For accurate attribution, your server-side GA4 pipeline must capture gclid (Google Click ID) and fbclid (Facebook Click ID) from the URL query parameters. These are the crucial join keys.
If your GTM Server Container uses the built-in GA4 Client, it often captures these automatically. However, it's good practice to ensure they are explicitly mapped as event_parameters. For instance, the page_location parameter typically contains these. If not, a custom variable can extract them from the URL:
GTM SC Custom Variable to extract gclid or fbclid:
- Variable Type:
URL - Component Type:
Query - Query Key:
gclid(create{{URL Query - gclid}}) - Query Key:
fbclid(create{{URL Query - fbclid}})
Then, in your GA4 Event Tags (especially for purchase, generate_lead):
- Add an Event Parameter:
gclid, value{{URL Query - gclid}} - Add an Event Parameter:
fbclid, value{{URL Query - fbclid}}
GA4 will then export these as event_params in BigQuery, and you can access them via UNNEST(event_params).
2. Ingesting Google Ads Cost Data: BigQuery Data Transfer Service (BQ DTS)
BQ DTS provides a native, automated way to import your Google Ads data into BigQuery.
Setup Steps:
- Enable BQ Data Transfer Service API: In your GCP Project, search for "BigQuery Data Transfer API" and enable it.
- Create a New Transfer: Navigate to BigQuery -> Data transfers in the GCP Console.
- Click CREATE TRANSFER.
- Source: Select
Google Ads. - Transfer config name:
Google Ads Daily Sync - Schedule:
Daily(or your preferred frequency). - Destination dataset: Choose an existing dataset or create a new one (e.g.,
marketing_raw). - Data source specific parameters:
- Customer ID: Enter your Google Ads Customer ID(s).
- Preferences: Configure which tables to transfer (e.g.,
Campaigns,AdGroups,KeywordStats,ClickPerformance,Impressions).ClickPerformanceis crucial forgclidand cost data.
- Service Account: Ensure the service account associated with the transfer has
Vieweraccess to your Google Ads account. - Save.
BQ DTS will now automatically create and populate tables like p_Clicks_XXXXXX, p_Campaign_XXXXXX, p_Cost_XXXXXX in your marketing_raw dataset, containing campaign data, costs, and gclid values.
3. Ingesting Facebook Ads Cost Data: Cloud Run ETL
Facebook Ads data doesn't have a direct BQ DTS integration. We'll build a lightweight Python Cloud Run service to fetch this data via the Facebook Marketing API and stream it to BigQuery.
a. BigQuery Table for Facebook Ads Data:
Create a table to store your Facebook Ads campaign performance data.
CREATE TABLE `your_gcp_project.marketing_raw.facebook_ads_performance` (
report_date DATE NOT NULL,
account_id STRING,
account_name STRING,
campaign_id STRING,
campaign_name STRING,
adset_id STRING,
adset_name STRING,
ad_id STRING,
ad_name STRING,
spend NUMERIC,
impressions INTEGER,
clicks INTEGER,
unique_clicks INTEGER,
reach INTEGER,
-- fbclid (Facebook Click ID) is usually retrieved from URL/landing page,
-- not directly available in aggregate performance reports from API.
-- For linking, you'd rely on matching by campaign/date for cost, and fbclid in GA4 for conversions.
gcp_insert_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP()
)
PARTITION BY report_date
CLUSTER BY campaign_id, ad_id
OPTIONS (\n description = 'Daily performance data from Facebook Ads Marketing API.'\n);
b. Python Facebook Ads ETL Service (Cloud Run):
This Flask app will use the facebook_business SDK to fetch data and insert it into BigQuery.
facebook-ads-etl/main.py:
import os
import json
import datetime
from flask import Flask, request, jsonify
from google.cloud import bigquery, secretmanager
from facebook_business.api import FacebookAdsApi
from facebook_business.adobjects.adaccount import AdAccount
from facebook_business.adobjects.campaign import Campaign
from facebook_business.adobjects.adsinsights import AdsInsights
import logging
app = Flask(__name__)
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# --- BigQuery Configuration ---
BIGQUERY_PROJECT_ID = os.environ.get('GCP_PROJECT_ID')
BIGQUERY_DATASET_ID = os.environ.get('BIGQUERY_DATASET_ID', 'marketing_raw')
BIGQUERY_TABLE_ID = os.environ.get('BIGQUERY_TABLE_ID', 'facebook_ads_performance')
TABLE_FULL_ID = f"{BIGQUERY_PROJECT_ID}.{BIGQUERY_DATASET_ID}.{BIGQUERY_TABLE_ID}"
bigquery_client = bigquery.Client(project=BIGQUERY_PROJECT_ID)
secret_manager_client = secretmanager.SecretManagerServiceClient()
# --- Facebook Ads API Configuration (use Secret Manager for tokens!) ---
# These environment variables hold the Secret Manager resource names
FB_APP_ID_SECRET_NAME = os.environ.get('FB_APP_ID_SECRET_NAME', 'FB_APP_ID')
FB_APP_SECRET_SECRET_NAME = os.environ.get('FB_APP_SECRET_SECRET_NAME', 'FB_APP_SECRET')
FB_ACCESS_TOKEN_SECRET_NAME = os.environ.get('FB_ACCESS_TOKEN_SECRET_NAME', 'FB_ACCESS_TOKEN')
FB_AD_ACCOUNT_ID = os.environ.get('FB_AD_ACCOUNT_ID', 'act_XXXXXXXXXXXXXXX') # Your Ad Account ID
def get_secret_value(secret_name):
try:
secret_path = secret_manager_client.secret_version_path(
BIGQUERY_PROJECT_ID, secret_name, 'latest'
)
response = secret_manager_client.access_secret_version(request={"name": secret_path})
return response.payload.data.decode('UTF-8')
except Exception as e:
logger.error(f"Error accessing secret {secret_name}: {e}")
return None
def init_facebook_api():
app_id = get_secret_value(FB_APP_ID_SECRET_NAME)
app_secret = get_secret_value(FB_APP_SECRET_SECRET_NAME)
access_token = get_secret_value(FB_ACCESS_TOKEN_SECRET_NAME)
if not all([app_id, app_secret, access_token, FB_AD_ACCOUNT_ID]):
logger.error("Missing Facebook Ads API credentials. Cannot initialize API.")
return None
FacebookAdsApi.init(app_id=app_id, app_secret=app_secret, access_token=access_token)
logger.info("Facebook Ads API initialized.")
return True
@app.route('/fetch-facebook-ads-data', methods=['POST'])
def fetch_facebook_ads_data():
if not init_facebook_api():
return jsonify({'status': 'failed', 'message': 'Failed to initialize Facebook Ads API'}), 500
try:
# Determine report date. For scheduled jobs, usually fetch yesterday's data.
# Can be overridden by request payload if needed for backfill.
report_date_str = request.get_json().get('report_date') if request.is_json else None
if report_date_str:
report_date = datetime.datetime.fromisoformat(report_date_str).date()
else:
report_date = datetime.date.today() - datetime.timedelta(days=1) # Default: yesterday
logger.info(f"Fetching Facebook Ads data for date: {report_date.isoformat()}")
# Fetch insights at the AD level
ad_account = AdAccount(FB_AD_ACCOUNT_ID)
params = {
'time_range': {
'since': report_date.isoformat(),
'until': report_date.isoformat()
},
'level': 'ad', # Fetch at Ad level for granularity
'limit': 1000 # Max limit per API call
}
fields = [
AdsInsights.Field.account_id,
AdsInsights.Field.account_name,
AdsInsights.Field.campaign_id,
AdsInsights.Field.campaign_name,
AdsInsights.Field.adset_id,
AdsInsights.Field.adset_name,
AdsInsights.Field.ad_id,
AdsInsights.Field.ad_name,
AdsInsights.Field.spend,
AdsInsights.Field.impressions,
AdsInsights.Field.clicks,
AdsInsights.Field.unique_clicks,
AdsInsights.Field.reach,
AdsInsights.Field.date_start # This will be the report_date
]
insights = ad_account.get_insights(fields=fields, params=params)
rows_to_insert = []
for insight in insights:
row = {
'report_date': insight['date_start'],
'account_id': insight.get('account_id'),
'account_name': insight.get('account_name'),
'campaign_id': insight.get('campaign_id'),
'campaign_name': insight.get('campaign_name'),
'adset_id': insight.get('adset_id'),
'adset_name': insight.get('adset_name'),
'ad_id': insight.get('ad_id'),
'ad_name': insight.get('ad_name'),
'spend': float(insight.get('spend', 0.0)),
'impressions': int(insight.get('impressions', 0)),
'clicks': int(insight.get('clicks', 0)),
'unique_clicks': int(insight.get('unique_clicks', 0)),
'reach': int(insight.get('reach', 0)),
}
rows_to_insert.append(row)
if rows_to_insert:
errors = bigquery_client.insert_rows_json(TABLE_FULL_ID, rows_to_insert)
if errors:
logger.error(f"BigQuery insert errors for Facebook Ads data: {errors}")
return jsonify({'status': 'failed', 'message': 'Partial success, some rows failed to insert', 'errors': errors}), 500
else:
logger.info(f"Successfully ingested {len(rows_to_insert)} rows of Facebook Ads data for {report_date.isoformat()}.")
return jsonify({'status': 'success', 'rows_ingested': len(rows_to_insert)}), 200
else:
logger.info(f"No Facebook Ads data found for {report_date.isoformat()}.")
return jsonify({'status': 'success', 'rows_ingested': 0}), 200
except Exception as e:
logger.error(f"Error fetching/ingesting Facebook Ads data: {e}", exc_info=True)
return jsonify({'status': 'failed', 'message': str(e)}), 500
if __name__ == '__main__':
app.run(debug=True, host='0.0.0.0', port=int(os.environ.get('PORT', 8080)))
facebook-ads-etl/requirements.txt:
Flask
google-cloud-bigquery
google-cloud-secret-manager
facebook-business
requests # facebook-business depends on requests
c. Deploy the Facebook Ads ETL Service to Cloud Run:
# First, create secrets in Secret Manager for Facebook API credentials
echo "YOUR_FB_APP_ID" | gcloud secrets create FB_APP_ID --data-file=- --project YOUR_GCP_PROJECT_ID
echo "YOUR_FB_APP_SECRET" | gcloud secrets create FB_APP_SECRET --data-file=- --project YOUR_GCP_PROJECT_ID
echo "YOUR_FB_ACCESS_TOKEN" | gcloud secrets create FB_ACCESS_TOKEN --data-file=- --project YOUR_GCP_PROJECT_ID
# Grant the Cloud Run service account access to these secrets
# The service account is typically [email protected]
gcloud secrets add-iam-policy-binding FB_APP_ID --role="roles/secretmanager.secretAccessor" --member="serviceAccount:[email protected]" --project YOUR_GCP_PROJECT_ID
gcloud secrets add-iam-policy-binding FB_APP_SECRET --role="roles/secretmanager.secretAccessor" --member="serviceAccount:[email protected]" --project YOUR_GCP_PROJECT_ID
gcloud secrets add-iam-policy-binding FB_ACCESS_TOKEN --role="roles/secretmanager.secretAccessor" --member="serviceAccount:YOUR_GCP_PROJECT_ID.iam.gserviceaccount.com" --project YOUR_GCP_PROJECT_ID
gcloud run deploy facebook-ads-etl-service \
--source ./facebook-ads-etl \
--platform managed \
--region YOUR_GCP_REGION \
--no-allow-unauthenticated \
--set-env-vars \
GCP_PROJECT_ID="YOUR_GCP_PROJECT_ID",\
BIGQUERY_DATASET_ID="marketing_raw",\
BIGQUERY_TABLE_ID="facebook_ads_performance",\
FB_APP_ID_SECRET_NAME="FB_APP_ID",\
FB_APP_SECRET_SECRET_NAME="FB_APP_SECRET",\
FB_ACCESS_TOKEN_SECRET_NAME="FB_ACCESS_TOKEN",\
FB_AD_ACCOUNT_ID="act_YOUR_FACEBOOK_AD_ACCOUNT_ID" \
--memory 512Mi \
--cpu 1 \
--timeout 300s # Allow ample time for API calls and BQ inserts
# Grant the Cloud Run service account 'roles/bigquery.dataEditor' on your marketing_raw dataset
d. Schedule with Cloud Scheduler:
gcloud scheduler jobs create http facebook-ads-daily-etl \
--location YOUR_GCP_REGION \
--schedule "0 2 * * *" \
--uri "https://facebook-ads-etl-service-YOUR_SERVICE_HASH-YOUR_GCP_REGION.a.run.app/fetch-facebook-ads-data" \
--http-method POST \
--headers "Content-Type=application/json" \
--oidc-service-account-email YOUR_CLOUD_RUN_SERVICE_ACCOUNT_EMAIL \
--oidc-token-audience "https://facebook-ads-etl-service-YOUR_SERVICE_HASH-YOUR_GCP_REGION.a.run.app" \
--message-body "{}" \
--project YOUR_GCP_PROJECT_ID
Important: Replace YOUR_CLOUD_RUN_SERVICE_ACCOUNT_EMAIL with the actual service account email of your facebook-ads-etl-service (e.g., [email protected]). This ensures secure, authenticated invocation. The schedule 0 2 * * * means daily at 2:00 AM UTC.
4. Unifying Data in BigQuery: unified_marketing_performance
Now, let's create a view that combines GA4 conversions with Google Ads and Facebook Ads data.
-- Replace with your actual project IDs and GA4 property ID
CREATE OR REPLACE VIEW `your_gcp_project.marketing_analytics.unified_marketing_performance` AS
WITH GA4_Conversions AS (
SELECT
FORMAT_DATE('%Y-%m-%d', TIMESTAMP_MICROS(event_timestamp)) AS event_date,
user_pseudo_id,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'gclid') AS gclid,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'fbclid') AS fbclid,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'transaction_id') AS transaction_id,
(SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'value') AS revenue,
event_name,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'campaign') AS ga4_campaign,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'source') AS ga4_source,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'medium') AS ga4_medium,
-- Add other relevant GA4 event parameters or user properties
FROM `your_gcp_project.analytics_YOUR_GA4_PROPERTY_ID.events_*`
WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY)) AND FORMAT_DATE('%Y%m%d', CURRENT_DATE())
AND event_name IN ('purchase', 'generate_lead') -- Focus on conversion events
QUALIFY ROW_NUMBER() OVER(PARTITION BY transaction_id ORDER BY event_timestamp DESC) = 1 -- Deduplicate purchases if necessary
),
GoogleAds_Costs AS (
SELECT
DATE(ad.date) AS report_date,
ad.campaign_id AS google_campaign_id,
ad.campaign_name AS google_campaign_name,
ad.source AS google_source, -- e.g., 'google'
ad.medium AS google_medium, -- e.g., 'cpc'
SUM(ad.cost) AS google_ad_cost,
FROM `your_gcp_project.marketing_raw.p_Ad_XXXXXX` AS ad -- Replace with your actual BQ DTS Ad table
WHERE ad.date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY) AND CURRENT_DATE()
GROUP BY 1,2,3,4,5
),
FacebookAds_Costs AS (
SELECT
report_date,
campaign_id AS facebook_campaign_id,
campaign_name AS facebook_campaign_name,
'facebook' AS facebook_source,
'cpc' AS facebook_medium, -- Or other relevant medium
SUM(spend) AS facebook_ad_cost,
FROM `your_gcp_project.marketing_raw.facebook_ads_performance`
WHERE report_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY) AND CURRENT_DATE()
GROUP BY 1,2,3,4,5
)
SELECT
COALESCE(gac.event_date, gca.report_date, fba.report_date) AS event_date,
gac.user_pseudo_id,
gac.gclid,
gac.fbclid,
gac.transaction_id,
gac.revenue,
gac.event_name,
gac.ga4_campaign,
gac.ga4_source,
gac.ga4_medium,
gca.google_ad_cost,
fba.facebook_ad_cost
FROM
GA4_Conversions AS gac
LEFT JOIN
GoogleAds_Costs AS gca
ON gac.event_date = gca.report_date
AND gac.gclid = gca.gclid -- Note: BQ DTS click performance table will have gclid to join on
LEFT JOIN
FacebookAds_Costs AS fba
ON gac.event_date = fba.report_date
-- For fbclid, you'd typically match by date/campaign (if fbclid is not in FB Ads data directly)
-- More complex stitching might involve window functions or a separate user_journey table
WHERE
gac.event_date IS NOT NULL
;
Important Considerations for Joining:
gclidandfbclid: These are the primary keys for connecting conversions to specific ad clicks. Ensure your GA4 events always capture these._TABLE_SUFFIX: When querying GA4 export tables (events_*), always filter by_TABLE_SUFFIXto limit the data scanned and reduce costs.- Time Zones: Ensure all timestamps are standardized (e.g., UTC) before joining to avoid discrepancies, especially around midnight. The GA4 export
event_timestampis in UTC microseconds. Your Google Ads and Facebook Ads data should ideally also be normalized to UTC. - Deduplication: The
QUALIFY ROW_NUMBER() OVER(PARTITION BY transaction_id ORDER BY event_timestamp DESC) = 1is crucial forpurchaseevents in GA4 to handle potential duplicates (from retries or parallel client/server-side firing).
5. Implementing Custom Attribution Models
With unified_marketing_performance, you can now apply various attribution models. This typically involves window functions to analyze user touchpoints leading up to a conversion.
First, let's create a hypothetical user_journey_events table (or view) that captures all relevant touchpoints (page views, clicks, conversions) for a user, enriched with marketing channel information. This might involve complex joins with your raw event data lake for richer referrer information, or using GA4's traffic_source fields.
-- This is a simplified example. A real user journey model is more complex.
CREATE OR REPLACE VIEW `your_gcp_project.marketing_analytics.user_journey_events` AS
SELECT
user_pseudo_id,
TIMESTAMP_MICROS(event_timestamp) AS event_time,
event_name,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'gclid') AS gclid,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'fbclid') AS fbclid,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'source') AS source,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'medium') AS medium,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'campaign') AS campaign,
(SELECT value.double_value FROM UNNEST(event_params) WHERE key = 'value') AS revenue,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'transaction_id') AS transaction_id,
ROW_NUMBER() OVER(PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS touchpoint_sequence_num
FROM
`your_gcp_project.analytics_YOUR_GA4_PROPERTY_ID.events_*`
WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY)) AND FORMAT_DATE('%Y%m%d', CURRENT_DATE())
-- Consider only relevant events that constitute a touchpoint or conversion
AND event_name IN ('page_view', 'session_start', 'purchase', 'generate_lead')
;
-- Now, using `user_journey_events` to model attribution for 'purchase' events.
-- Example 1: Last-Click Attribution
CREATE OR REPLACE VIEW `your_gcp_project.marketing_analytics.last_click_attributed_conversions` AS
SELECT
uj.transaction_id,
uj.revenue,
LAST_VALUE(uj.source IGNORE NULLS) OVER (PARTITION BY uj.transaction_id ORDER BY uj.event_time ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_click_source,
LAST_VALUE(uj.medium IGNORE NULLS) OVER (PARTITION BY uj.transaction_id ORDER BY uj.event_time ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_click_medium,
LAST_VALUE(uj.campaign IGNORE NULLS) OVER (PARTITION BY uj.transaction_id ORDER BY uj.event_time ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_click_campaign
FROM
`your_gcp_project.marketing_analytics.user_journey_events` AS uj
WHERE
uj.event_name = 'purchase'
AND uj.transaction_id IS NOT NULL
QUALIFY
ROW_NUMBER() OVER(PARTITION BY uj.transaction_id ORDER BY uj.event_time DESC) = 1 -- Ensure distinct purchases
;
-- Example 2: First-Click Attribution
CREATE OR REPLACE VIEW `your_gcp_project.marketing_analytics.first_click_attributed_conversions` AS
SELECT
uj.transaction_id,
uj.revenue,
FIRST_VALUE(uj.source IGNORE NULLS) OVER (PARTITION BY uj.transaction_id ORDER BY uj.event_time ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS first_click_source,
FIRST_VALUE(uj.medium IGNORE NULLS) OVER (PARTITION BY uj.transaction_id ORDER BY uj.event_time ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS first_click_medium,
FIRST_VALUE(uj.campaign IGNORE NULLS) OVER (PARTITION BY uj.transaction_id ORDER BY uj.event_time ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS first_click_campaign
FROM
`your_gcp_project.marketing_analytics.user_journey_events` AS uj
WHERE
uj.event_name = 'purchase'
AND uj.transaction_id IS NOT NULL
QUALIFY
ROW_NUMBER() OVER(PARTITION BY uj.transaction_id ORDER BY uj.event_time DESC) = 1
;
-- Example 3: Linear Attribution (simplified, distribute revenue evenly across all touchpoints for a transaction)
-- This is often done at a session level or user level, so the example focuses on distributing revenue
-- for a single purchase event across all touchpoints that happened before it within a reasonable window.
CREATE OR REPLACE VIEW `your_gcp_project.marketing_analytics.linear_attributed_conversions` AS
WITH TouchpointsBeforePurchase AS (
SELECT
uj.user_pseudo_id,
uj.event_time AS touchpoint_time,
uj.source,
uj.medium,
uj.campaign,
uj_conv.transaction_id,
uj_conv.revenue,
COUNT(DISTINCT uj.touchpoint_sequence_num) OVER(PARTITION BY uj_conv.transaction_id) AS total_touchpoints_for_conversion
FROM
`your_gcp_project.marketing_analytics.user_journey_events` AS uj
INNER JOIN
`your_gcp_project.marketing_analytics.user_journey_events` AS uj_conv
ON uj.user_pseudo_id = uj_conv.user_pseudo_id
AND uj_conv.event_name = 'purchase'
AND uj_conv.transaction_id IS NOT NULL
AND uj.event_time <= uj_conv.event_time -- Touchpoint happened before or at conversion
AND uj.event_time >= TIMESTAMP_SUB(uj_conv.event_time, INTERVAL 30 DAY) -- Within a 30-day lookback window
)
SELECT
transaction_id,
source,
medium,
campaign,
SUM(revenue / total_touchpoints_for_conversion) AS attributed_revenue
FROM
TouchpointsBeforePurchase
GROUP BY
transaction_id, source, medium, campaign
;
Note: Real-world attribution modeling can be significantly more complex, involving:
- Markov Chains or Shapley values for data-driven attribution.
- Sessionization logic to define specific touchpoint windows.
- Handling of direct traffic or unidentifiable sources.
- Dealing with
(not set)values. These SQL examples provide a starting point.
6. Calculating Blended ROAS
Now, combine your attributed conversions with the ad cost data.
CREATE OR REPLACE VIEW `your_gcp_project.marketing_analytics.blended_roas_by_channel` AS
WITH AttributedRevenue AS (
-- Choose your preferred attribution model here. Let's use Last-Click for simplicity
-- In a real scenario, you might blend multiple models or use a data-driven one.
SELECT
event_date,
last_click_source AS attributed_source,
last_click_medium AS attributed_medium,
last_click_campaign AS attributed_campaign,
SUM(revenue) AS total_attributed_revenue
FROM
`your_gcp_project.marketing_analytics.last_click_attributed_conversions`
GROUP BY 1,2,3,4
),
CombinedCosts AS (
SELECT
event_date,
'google' AS source,
'cpc' AS medium,
google_campaign_name AS campaign,
SUM(google_ad_cost) AS total_cost
FROM `your_gcp_project.marketing_analytics.unified_marketing_performance`
WHERE google_ad_cost IS NOT NULL AND google_ad_cost > 0
GROUP BY 1,2,3,4
UNION ALL
SELECT
event_date,
'facebook' AS source,
'cpc' AS medium,
facebook_campaign_name AS campaign,
SUM(facebook_ad_cost) AS total_cost
FROM `your_gcp_project.marketing_analytics.unified_marketing_performance`
WHERE facebook_ad_cost IS NOT NULL AND facebook_ad_cost > 0
GROUP BY 1,2,3,4
)
SELECT
COALESCE(ar.event_date, cc.event_date) AS report_date,
COALESCE(ar.attributed_source, cc.source) AS source,
COALESCE(ar.attributed_medium, cc.medium) AS medium,
COALESCE(ar.attributed_campaign, cc.campaign) AS campaign,
SUM(COALESCE(ar.total_attributed_revenue, 0)) AS total_revenue,
SUM(COALESCE(cc.total_cost, 0)) AS total_cost,
CASE
WHEN SUM(COALESCE(cc.total_cost, 0)) > 0 THEN SUM(COALESCE(ar.total_attributed_revenue, 0)) / SUM(COALESCE(cc.total_cost, 0))
ELSE 0
END AS roas
FROM
AttributedRevenue AS ar
FULL OUTER JOIN -- Use FULL OUTER JOIN to ensure all costs and revenues are included
CombinedCosts AS cc
ON ar.event_date = cc.event_date
AND ar.attributed_source = cc.source
AND ar.attributed_medium = cc.medium
AND ar.attributed_campaign = cc.campaign
GROUP BY 1,2,3,4
ORDER BY report_date DESC, total_revenue DESC
;
7. Visualization: Looker Studio (formerly Google Data Studio)
Once your blended_roas_by_channel view is ready, connect it to Looker Studio:
- In Looker Studio, create a new report.
- Add a data source, select
BigQuery. - Choose your GCP project, dataset (
marketing_analytics), and then select theblended_roas_by_channelview. - You can now build interactive dashboards showing:
- ROAS by Source/Medium/Campaign over time.
- Total Revenue vs. Total Cost by channel.
- Attribution model comparisons (if you create views for multiple models).
Benefits of This Approach
- Holistic Marketing View: Gain a single source of truth for all your online marketing performance, integrating data that traditionally lives in silos.
- Accurate ROAS: Move beyond platform-biased reporting to calculate true Return On Ad Spend based on your custom attribution models.
- Data-Driven Decisions: Allocate marketing budget more effectively by understanding the true impact of each channel across the customer journey.
- Custom Attribution Flexibility: Experiment with different attribution models in SQL to find what best reflects your business's understanding of value.
- Scalability & Resilience: Leverage BigQuery's power to handle large datasets, and Cloud Run for reliable, scheduled data ingestion.
- Data Ownership: You maintain full control over your valuable marketing data and the logic applied to it.
- Agility: Easily adjust attribution windows, add new ad platforms, or refine models without complex vendor integrations.
Important Considerations
- Data Latency: While GA4 provides near real-time data for some operations, its BigQuery export can have a daily delay. Ad platform data also syncs daily. Your ROAS reports will reflect data up to yesterday.
- Cost Management: BigQuery costs scale with data scanned and stored. Partition and cluster your tables effectively. Optimize your SQL queries to avoid full table scans. Monitor BQ DTS and Cloud Run costs.
- PII & Data Governance: Ensure PII handling in GA4 events is consistent (hashed
user_ids, no raw PII). Ad platform data can also contain sensitive information. Apply BigQuery RLS, CLS, and Data Masking for secure access. - Attribution Model Choice: The "best" attribution model is highly business-dependent. Start simple (last-click, linear) and iterate. Data-driven models (e.g., using BigQuery ML) are the most advanced but require significant data history.
- Cross-Device Stitching: This solution primarily relies on GA4's
user_pseudo_idanduser_idfor stitching. For advanced cross-device tracking, ensure youruser_idimplementation is robust and consistent. gclidandfbclidCapture: Double-check your server-side GTM setup to confirm these identifiers are reliably captured and passed as event parameters to GA4 conversions. Without them, linking to ad clicks is impossible.- Offline Conversions: For a truly comprehensive ROAS, consider incorporating offline conversions into your
unified_marketing_performancemodel.
Conclusion
Achieving a holistic, accurate view of your marketing performance and calculating true ROAS is fundamental for optimizing ad spend and driving business growth. By building a BigQuery-centric data hub that combines server-side GA4 conversion data with Google Ads and Facebook Ads cost data, you empower your organization with unparalleled analytical capabilities. This robust, custom solution transcends the limitations of platform-specific reporting, enabling precise attribution modeling and actionable insights that will transform your marketing strategy. Embrace this server-side data engineering approach to unlock the full potential of your marketing investments.