Back to Insights
Data Engineering 11/5/2024 5 min read

Capturing and Utilizing Crucial Client-Side Context: Referrer & User-Agent in Server-Side GA4 with GTM & Cloud Run

Capturing and Utilizing Crucial Client-Side Context: Referrer & User-Agent in Server-Side GA4 with GTM & Cloud Run

You've harnessed the power of server-side Google Analytics 4 (GA4), leveraging Google Tag Manager (GTM) Server Container on Cloud Run to centralize data collection, apply transformations, enrich events, and enforce granular consent. This architecture provides robust control, accuracy, and compliance, forming the backbone of your modern analytics strategy.

However, a fundamental challenge with server-side tracking often involves reliably capturing and utilizing client-side contextual information that is critical for understanding user behavior and attributing conversions. Specifically, the HTTP Referer header and User-Agent header provide invaluable insights into where users are coming from and what device/browser they are using.

While client-side JavaScript automatically captures document.referrer and navigator.userAgent, relying solely on this can be problematic in a privacy-first, ad-blocker-laden world. Client-side scripts can be blocked, altered, or simply fail to execute, leading to:

  • Incomplete Attribution: Missing or incorrect referrer data breaks attribution models, making it difficult to understand marketing channel performance.
  • Bot Traffic Skew: Without reliable User-Agent data, identifying and filtering bot traffic from your analytics becomes challenging, leading to inflated user counts and skewed engagement metrics.
  • Limited Device Insights: Inability to accurately determine device type, operating system, and browser can hinder user experience optimization and segmentation.
  • Data Consistency Issues: Discrepancies between client-side captured and server-side available data sources.

The problem, then, is how to reliably capture and leverage the raw Referer and User-Agent HTTP headers within your server-side GA4 pipeline to ensure accurate attribution, effective bot filtering, and richer device-level insights, even when client-side mechanisms fall short.

Why Server-Side for Referrer & User-Agent?

Capturing these headers on the server-side offers significant advantages:

  1. Reliability: HTTP headers are part of the fundamental web request and are generally more resilient to client-side interference (ad-blockers, ITP) than JavaScript execution.
  2. Completeness: You get the raw HTTP Referer (the full URL of the previous page) and User-Agent strings as they were sent by the browser.
  3. Centralized Processing: Process and standardize these headers within your GTM Server Container, making clean data available for all downstream platforms (GA4, CRM, ad platforms).
  4. Enhanced Bot Detection: Use server-side logic or external services (e.g., Cloud Run + bot detection APIs) to analyze User-Agent strings for more sophisticated bot filtering.
  5. Attribution Resilience: Implement server-side logic to derive richer attribution parameters, less reliant on client-side state.

Our Solution Architecture: Server-Side Header Capture & Utilization

We'll integrate custom logic within your GTM Server Container to extract the raw Referer and User-Agent headers. This information will then be used to:

  • Enrich the event data.
  • Send as custom dimensions to GA4 for reporting.
  • Optionally, send to a dedicated Cloud Run service for advanced bot detection or more complex referrer parsing.
graph TD
    A[Browser/Client-Side] -->|1. HTTP Request (with Referer, User-Agent headers)| B(GTM Web Container);
    B -->|2. HTTP Request to GTM Server Container Endpoint| C(GTM Server Container on Cloud Run);

    subgraph GTM Server Container Processing
        C --> D{3. GTM SC Client Processes Event};
        D --> E[4. Custom Variable: Extract HTTP Headers];
        E -->|5. Raw Referer & User-Agent Strings| D;
        D --> F[6. Custom Tag: Enrich Event Data (Add Headers to payload)];
        F -->|7. Enriched Event Data (with custom_user_agent, custom_referrer)| G[GTM SC Event Data (Internal)];
        G --> H[8. Data Quality, PII Scrubbing, Consent Evaluation, Enrichment];
        H -->|9. Dispatch to GA4 Measurement Protocol| I[Google Analytics 4];
        G -->|Optional: 10. Call Bot Detection Service| J[Bot Detection Service (Python on Cloud Run)];
        J --> K[Bot Detection Database/Logic];
        J -->|11. Return Bot Status (is_bot: true/false)| H;
        H -->|12. Conditional Processing (e.g., exclude bots from GA4)| I;
    end

Key Steps in the GTM Server Container:

  1. Ingest Event: The GTM SC receives the HTTP request from the client-side.
  2. Extract Headers: Custom GTM SC variables or tags getRequestHeader() to pull the Referer and User-Agent values directly.
  3. Enrich Event Data: Store these extracted headers into meaningful keys within the GTM SC's eventData (e.g., _incoming_http_referrer, _incoming_http_user_agent).
  4. Utilize in GA4: Send these enriched fields to GA4 as custom event parameters, which can then be registered as Custom Dimensions for reporting.
  5. Advanced Logic (Optional):
    • Call a Cloud Run service with the User-Agent for sophisticated bot detection.
    • Call a Cloud Run service to parse the Referer string and extract more granular attribution details (e.g., source, medium, campaign parameters) for platforms that don't auto-parse these from raw URLs.

Core Components Deep Dive & Implementation Steps

1. GTM Server Container: Extracting HTTP Headers

The getRequestHeader API in GTM Server Container custom templates is your primary tool here.

GTM SC Custom Variable Template: Extract HTTP Request Header

const getRequestHeader = require('getRequestHeader');
const log = require('log');

// This custom variable template reads a specific header from the incoming request.
// Configuration fields:
//   - headerName: Text input for the name of the HTTP header to read (e.g., 'referer', 'user-agent')

const headerName = data.headerName;

if (!headerName) {
    log('Header name not configured for extraction.', 'ERROR');
    data.gtmOnSuccess(undefined);
    return;
}

const headerValue = getRequestHeader(headerName.toLowerCase()); // Headers are case-insensitive, but request them lowercase

if (headerValue) {
    log(`Successfully extracted header '${headerName}': ${headerValue.substring(0, 100)}...`, 'DEBUG');
    data.gtmOnSuccess(headerValue);
} else {
    log(`Header '${headerName}' not found in request.`, 'INFO');
    data.gtmOnSuccess(undefined);
}

Implementation in GTM SC:

  1. Create a Custom Variable Template named Extract HTTP Request Header.
  2. Paste the code. Add permission: Access request headers.
  3. Create two Custom Variables using this template:
    • {{Incoming HTTP Referer}}: Configure headerName to referer.
    • {{Incoming HTTP User-Agent}}: Configure headerName to user-agent.
  4. Set the triggers for these variables to Initialization - All Pages or All Events with a very high priority, ensuring they are evaluated early in the event lifecycle.

These variables will now contain the raw header values and can be used in other tags and templates.

2. Enriching Event Data in GTM Server Container

To make these headers easily accessible for GA4 and other tags, we'll store them in the eventData context.

GTM SC Custom Tag Template: Add Incoming Headers to EventData

const getEventData = require('getEventData');
const setInEventData = require('setInEventData');
const log = require('log');

// Configuration fields for the template:
//   - referrerVariable: Text input, name of the variable holding referrer (e.g., '{{Incoming HTTP Referer}}')
//   - userAgentVariable: Text input, name of the variable holding user-agent (e.g., '{{Incoming HTTP User-Agent}}')

const referrer = getEventData(data.referrerVariable);
const userAgent = getEventData(data.userAgentVariable);

if (referrer) {
    setInEventData('incoming_http_referrer', referrer, true); // True for ephemeral
    log('Added incoming_http_referrer to eventData.', 'DEBUG');
} else {
    log('No incoming_http_referrer found to add.', 'DEBUG');
}

if (userAgent) {
    setInEventData('incoming_http_user_agent', userAgent, true); // True for ephemeral
    log('Added incoming_http_user_agent to eventData.', 'DEBUG');
} else {
    log('No incoming_http_user_agent found to add.', 'DEBUG');
}

data.gtmOnSuccess();

Implementation in GTM SC:

  1. Create a Custom Tag Template named Add Incoming Headers to EventData.
  2. Paste the code. Add permission: Access event data.
  3. Create a Custom Tag using this template.
  4. Configure:
    • referrerVariable: {{Incoming HTTP Referer}}
    • userAgentVariable: {{Incoming HTTP User-Agent}}
  5. Set the trigger for this tag to All Events with a higher priority than your GA4 tags, but after the Extract HTTP Request Header variables are evaluated.

Now, your eventData will contain incoming_http_referrer and incoming_http_user_agent for every event.

3. Utilizing in GA4: Custom Dimensions for Attribution & Device Insights

For these new fields to be useful in GA4 reports, you must register them as Custom Dimensions.

Steps in GA4 Admin:

  1. Navigate to Admin (gear icon) -> Custom definitions.
  2. Click Create custom dimensions.
  3. For Referrer:
    • Dimension name: HTTP Referrer (or Incoming Referrer URL)
    • Scope: Event
    • Description: Raw HTTP Referer header from server-side.
    • Event parameter: incoming_http_referrer
    • Click Save.
  4. For User-Agent:
    • Dimension name: HTTP User Agent (or Raw User-Agent String)
    • Scope: Event
    • Description: Raw HTTP User-Agent header from server-side.
    • Event parameter: incoming_http_user_agent
    • Click Save.

Update Your GA4 Event Tags in GTM SC: For any GA4 event tags (especially page_view, session_start, first_visit), ensure they send these new parameters.

  1. In your GA4 event tag, navigate to Event Parameters.
  2. Add a row:
    • Parameter Name: incoming_http_referrer
    • Value: {{Event Data - incoming_http_referrer}}
  3. Add another row:
    • Parameter Name: incoming_http_user_agent
    • Value: {{Event Data - incoming_http_user_agent}}

These will now flow into GA4, and once enough data is collected, you can see them in GA4 Explorations to analyze traffic sources, device breakdowns, and more granular user paths.

4. Advanced: Server-Side Bot Detection with Cloud Run (Optional)

Instead of sending raw User-Agent to GA4, you might want to process it for bot detection and only send a is_bot: true/false flag, or even filter out bot events entirely.

a. Python Bot Detection Service (Cloud Run) A simple Python service could analyze User-Agent strings. For production, you might integrate with a commercial bot detection API or a more sophisticated open-source library.

bot-detection-service/main.py example:

import os
import json
from flask import Flask, request, jsonify
import logging
import re

app = Flask(__name__)\nlogging.basicConfig(level=logging.INFO)\nlogger = logging.getLogger(__name__)\n\n# Simple list of known bot/crawler keywords (illustrative, not exhaustive)\nBOT_KEYWORDS = [\n    'bot', 'crawler', 'spider', 'archiver', 'monitor', 'slurp', 'mediapartners-google',\n    'adsbot', 'bingbot', 'yandexbot', 'duckduckbot', 'semrushbot', 'ahrefsbot'\n]\n\ndef is_known_bot(user_agent_string):\n    if not user_agent_string or not isinstance(user_agent_string, str):\n        return False\n    for keyword in BOT_KEYWORDS:\n        if keyword in user_agent_string.lower():\n            return True\n    # Add more complex regex patterns or external API calls here\n    return False\n\[email protected]('/detect-bot', methods=['POST'])\ndef detect_bot():\n    if not request.is_json:\n        logger.warning(\"Bot Detection: Request is not JSON. Content-Type: %s\", request.headers.get('Content-Type'))\n        return jsonify({'error': 'Request must be JSON'}), 400\n\n    try:\n        data = request.get_json()\n        user_agent = data.get('user_agent')\n\n        if not user_agent:\n            logger.warning(\"Bot Detection: No user_agent provided.\")\n            return jsonify({'is_bot': False, 'reason': 'no_user_agent'}), 200\n        \n        bot_status = is_known_bot(user_agent)\n        reason = 'known_signature' if bot_status else 'human_like'\n\n        logger.info(f\"Bot detection for User-Agent '{user_agent[:50]}...': is_bot={bot_status}, reason={reason}\")\n        return jsonify({'is_bot': bot_status, 'reason': reason}), 200\n\n    except Exception as e:\n        logger.error(f\"Error during bot detection: {e}\", exc_info=True)\n        # On error, default to not a bot to avoid filtering legitimate traffic\n        return jsonify({'is_bot': False, 'reason': 'detection_error'}), 500\n\nif __name__ == '__main__':\n    app.run(debug=True, host='0.0.0.0', port=int(os.environ.get('PORT', 8080)))\n```

`bot-detection-service/requirements.txt`:

Flask


**Deploy the Python service to Cloud Run:**
```bash
gcloud run deploy bot-detection-service \\\
    --source ./bot-detection-service \\\
    --platform managed \\\
    --region YOUR_GCP_REGION \\\
    --allow-unauthenticated \\\
    --memory 256Mi \\\
    --cpu 1 \\\
    --timeout 10s

Important: Ensure the Cloud Run service account has minimal necessary permissions. Note down the URL.

b. GTM Server Container Custom Tag for Bot Detection

const sendHttpRequest = require('sendHttpRequest');
const JSON = require('JSON');
const log = require('log');
const getEventData = require('getEventData');
const setInEventData = require('setInEventData');
const data = require('data'); // Assuming data is already passed into the template's context

// Configuration fields for the template:
//   - botDetectionServiceUrl: Text input for your Cloud Run Bot Detection service URL
//   - userAgentVariable: Text input for the event data key holding user-agent (e.g., 'incoming_http_user_agent')
//   - excludeBotsFromGA4: Boolean checkbox to control if bots should be excluded from GA4

const botDetectionServiceUrl = data.botDetectionServiceUrl;
const userAgent = getEventData(data.userAgentVariable);
const excludeBotsFromGA4 = data.excludeBotsFromGA4 === true;

if (!botDetectionServiceUrl || !userAgent) {\n    log('Bot Detection Service URL or User-Agent not available. Skipping detection.', 'DEBUG');\n    setInEventData('is_bot_traffic', false, true); // Default to not bot if service not configured or UA missing\n    data.gtmOnSuccess();\n    return;\n}\n\nsendHttpRequest(botDetectionServiceUrl + '/detect-bot', {\n    method: 'POST',\n    headers: { 'Content-Type': 'application/json' },\n    body: JSON.stringify({ user_agent: userAgent }),\n    timeout: 3000 // 3 seconds timeout\n}, (statusCode, headers, body) => {\n    if (statusCode >= 200 && statusCode < 300) {\n        try {\n            const response = JSON.parse(body);\n            const isBot = response.is_bot === true;\n            const reason = response.reason || 'unknown';\n            log(`Bot detection result: is_bot=${isBot}, reason=${reason}`, 'INFO');\n            \n            setInEventData('is_bot_traffic', isBot, true); // Store result in eventData\n            setInEventData('bot_detection_reason', reason, true); // Store reason\n\n            if (excludeBotsFromGA4 && isBot) {\n                log('Excluding bot traffic from GA4 due to detection.', 'INFO');\n                data.gtmOnFailure(); // Fail to prevent subsequent GA4 tags from firing\n            } else {\n                data.gtmOnSuccess();\n            }\n\n        } catch (e) {\n            log('Error parsing bot detection service response:', e, 'ERROR');\n            setInEventData('is_bot_traffic', false, true); // Default to not bot on error\n            data.gtmOnSuccess(); // Continue processing\n        }\n    } else {\n        log('Bot detection service call failed:', statusCode, body, 'ERROR');\n        setInEventData('is_bot_traffic', false, true); // Default to not bot on error\n        data.gtmOnSuccess(); // Continue processing\n    }\n});\n```

**Implementation in GTM SC:**
1.  **Create a Custom Tag Template** named `Server-Side Bot Detector`.
2.  Paste the code. Add permissions: `Access event data`, `Send HTTP requests`.
3.  Create a **Custom Tag** using this template.
4.  Configure `botDetectionServiceUrl` with your Cloud Run service URL.
5.  Configure `userAgentVariable` to `incoming_http_user_agent`.
6.  Set `excludeBotsFromGA4` to `true` if you want to filter out bot traffic entirely from GA4.
7.  **Trigger:** Fire this tag on `All Events` with a priority higher than your GA4 tags. If `excludeBotsFromGA4` is `true` and a bot is detected, `data.gtmOnFailure()` will prevent subsequent GA4 tags from firing for that event.

You can also send `is_bot_traffic` as an event-scoped custom dimension to GA4 for reporting and filtering within the GA4 UI if you don't want to exclude them entirely server-side.

### Benefits of This Server-Side Approach

*   **Reliable Data Capture:** Obtain `Referer` and `User-Agent` consistently, even with client-side limitations.
*   **Enhanced Attribution:** More accurate source/medium reporting and custom attribution models in GA4 and your data warehouse.
*   **Cleaner Analytics:** Proactively identify and filter bot traffic, leading to more accurate user counts and engagement metrics.
*   **Deeper Device Insights:** Granular User-Agent data allows for detailed analysis of browser, OS, and device characteristics for audience segmentation and UX optimization.
*   **Centralized Control:** Manage and standardize critical client context from a single, server-side environment.
*   **Flexibility:** Implement custom parsing or integrate advanced third-party services for sophisticated analysis without modifying client-side code.

### Important Considerations

*   **Referrer Policy:** Be aware of the client's `Referrer-Policy` header. Some policies (e.g., `no-referrer`, `same-origin`) can restrict the `Referer` header sent by the browser. While server-side captures what's sent, it cannot force the browser to send more than allowed by policy.
*   **User-Agent Client Hints:** Modern browsers are moving towards User-Agent Client Hints, which provide more structured and privacy-preserving information than the monolithic User-Agent string. Your Cloud Run service should eventually be updated to parse and leverage these if they are consistently passed.
*   **Cost:** Cloud Run invocations for bot detection or referrer parsing services will incur costs. Monitor usage, especially for high-volume sites.
*   **Latency:** Calling external Cloud Run services introduces additional latency. Balance the value of real-time bot detection/referrer parsing against this latency. For simple cases, basic `User-Agent` string matching could be done directly in GTM SC, but it would be less maintainable.
*   **Complexity of Bot Detection:** The provided bot detection is rudimentary. Real-world bot detection is highly complex and often requires machine learning, IP reputation databases, and behavioral analysis. Consider specialized services for robust protection.

### Conclusion

Reliably capturing and intelligently utilizing the HTTP `Referer` and `User-Agent` headers is paramount for comprehensive web analytics. By integrating server-side mechanisms within your GTM Server Container on Cloud Run, you transform these crucial pieces of client-side context into actionable data. This approach not only fortifies your attribution models and bot detection capabilities but also provides deeper insights into your audience, ensuring your GA4 data is cleaner, more accurate, and ultimately more valuable for driving informed business decisions. Embrace server-side header management to elevate your analytics data quality to the next level.