Back to Insights
Data Engineering 12/3/2024 5 min read

Unlocking Full User Journeys: Server-Side Session & User Stitching for GA4 with GTM & Cloud Run

Unlocking Full User Journeys: Server-Side Session & User Stitching for GA4 with GTM & Cloud Run

You've harnessed the power of server-side Google Analytics 4 (GA4), leveraging Google Tag Manager (GTM) Server Container on Cloud Run to centralize data collection, apply transformations, enrich events, and enforce granular consent. This architecture provides robust control, accuracy, and compliance, forming the backbone of your modern analytics strategy.

However, a fundamental challenge remains for a complete understanding of your customer journey: robustly managing user sessions and stitching together anonymous (client_id) and authenticated (user_id) user data. In a privacy-first world, client-side limitations (browser Intelligent Tracking Prevention, ad-blockers, inconsistent cookie handling) often lead to:

  • Broken Sessions: Shortened cookie lifespans or inconsistent client-side session management can prematurely end GA4 sessions, inflating session counts and fragmenting user behavior.
  • Fragmented User Journeys: A user browsing anonymously (identified by client_id) before logging in (identified by user_id) is often treated as two distinct users in analytics, making it difficult to understand their full journey.
  • Inaccurate Attribution: Without consistent session and user identifiers, attributing conversions to the correct touchpoints becomes unreliable.
  • Limited Personalization: An inability to consistently identify a user (whether anonymous or authenticated) across interactions hampers personalization efforts.

The problem, then, is the need for a server-side mechanism that can reliably manage GA4 session boundaries, bridge the gap between anonymous and authenticated user identities, and ensure a persistent user view across your entire analytics and marketing ecosystem. Relying solely on client-side methods for these critical functions is increasingly insufficient and unreliable.

Why Server-Side for Session & User Stitching?

Moving session and user identity management to your GTM Server Container on Cloud Run offers significant advantages:

  1. Resilience: Server-side logic operates independently of client-side browser restrictions (ITP, ad-blockers), leading to more stable session and user identification.
  2. Consistency: Centralized logic ensures that ga_session_id and user_id are managed uniformly across all events, regardless of client-side variations.
  3. Unified Identity: A single server-side service can map anonymous client_ids to authenticated user_ids, providing a holistic view of the customer journey.
  4. Enhanced Control: Programmatic control over session timeouts and user_id prioritization allows for analytics that more closely align with your business definitions of a session and a user.
  5. Data Quality: Cleaner, more accurate ga_session_id and user_id parameters lead to more reliable GA4 reports and deeper insights.

Our Solution: Server-Side Identity & Session Management with GTM SC, Cloud Run & Firestore

Our solution introduces a dedicated Identity & Session Management Service built on Cloud Run and Firestore. This service will be called early in your GTM Server Container's processing flow to:

  1. Resolve user_id: Prioritize an authenticated user_id if available, and if not, maintain an anonymous client_id. It will also store a persistent mapping between client_id and user_id in Firestore.
  2. Manage ga_session_id: Determine the current session's ID based on the event timestamp and previous session activity stored in Firestore, respecting GA4's default 30-minute session timeout (or a custom one).
  3. Return Resolved Identifiers: The service returns the most appropriate user_id and ga_session_id for the event, which are then injected into the GTM SC's eventData.
  4. Dispatch to GA4: These resolved identifiers are then used in your GA4 tags to send robust, stitched data to GA4.

This pattern empowers you to build a comprehensive, privacy-aware, and accurate view of every customer's journey, from their first anonymous visit to their authenticated interactions.

Architecture: Server-Side Identity & Session Resolution

We'll integrate this new "Identity & Session Service" early in the GTM Server Container's processing flow.

graph TD
    A[User Browser/Client-Side] -->|1. Event (client_id, user_id (if logged in), event_timestamp)| B(GTM Web Container);
    B -->|2. HTTP Request to GTM Server Container Endpoint| C(GTM Server Container on Cloud Run);

    subgraph GTM Server Container Initial Processing
        C --> D{3. GTM SC Client Processes Event};
        D --> E[4. Custom Tag/Variable: Call Identity & Session Service (High Priority)];
        E -->|5. HTTP Request with client_id, user_id, event_timestamp| F[Identity & Session Service (Python on Cloud Run)];
        F -->|6a. Look up client_id-user_id map & Session State| G[Firestore (User Map, Session State)];
        F -->|6b. Resolve user_id, Determine/Update ga_session_id, Persist State| G;
        G -->|7. Return Resolved user_id, ga_session_id| F;
        F -->|8. Return Resolved Identifiers to GTM SC| E;
        E -->|9. Add Resolved Identifiers to Event Data (_resolved.user_id, _resolved.ga_session_id)| D;
    end

    D --> J[10. Other GTM SC Processing (Data Quality, Enrichment, Consent)];
    J -->|11. Dispatch to GA4 Measurement Protocol (using _resolved IDs)| K[Google Analytics 4];
    K --> L[GA4 Reports & Explorations];

Key Flow:

  1. Client-Side Event: A user interaction triggers an event. The GTM Web Container sends this to your GTM Server Container, including the client_id (from the _ga cookie), an optional user_id (if the user is logged in), and the event's timestamp.
  2. GTM SC Ingestion: GTM SC receives the HTTP request.
  3. Identity & Session Resolution (Early): A high-priority custom variable in your GTM Server Container extracts the incoming client_id, user_id, and event_timestamp, and makes an HTTP call to your Identity & Session Service (Cloud Run).
  4. Service Logic (Cloud Run):
    • User Stitching: It checks Firestore for an existing mapping between this client_id and a user_id. If a user_id is provided in the current event, it updates or creates this map. The service then returns the most consistent user_id (preferring authenticated over anonymous, if a link exists).
    • Session Management: It retrieves the last recorded event timestamp for the client_id. If the current event is more than 30 minutes after the last recorded event, a new ga_session_id is generated. Otherwise, the existing ga_session_id is reused. The session state (last event timestamp, ga_session_id) is updated in Firestore.
  5. GTM SC Updates Event Data: The GTM SC receives the resolved user_id and ga_session_id and adds them to the event's eventData (e.g., _resolved.user_id, _resolved.ga_session_id).
  6. Dispatch to GA4: The event, now enriched with robust user_id and ga_session_id parameters, proceeds through other GTM SC transformations, consent checks, and is dispatched to GA4 via the Measurement Protocol.

Core Components Deep Dive & Implementation Steps

1. Firestore Setup: User Mapping & Session State

Firestore will store the persistent mapping between client_id and user_id, and maintain the session state (last event timestamp, current ga_session_id) for each client_id.

a. Create a Firestore Database:

  1. In the GCP Console, navigate to Firestore.
  2. Choose "Native mode" and select a region close to your Cloud Run services.

b. Structure Your Data:

We'll use two collections: user_identity_map and session_state.

user_identity_map collection:

  • Document ID: client_id (e.g., GA1.1.123456789.0)
  • Fields:
    • user_id: The associated authenticated user ID (if known).
    • last_updated: Timestamp of the last update.
    • first_seen_at: Timestamp when this client_id was first seen.

session_state collection:

  • Document ID: client_id
  • Fields:
    • ga_session_id: The current ga_session_id for this client_id.
    • last_event_timestamp: The event_timestamp (in milliseconds) of the last event for this client_id.
    • session_start_timestamp: The event_timestamp (in milliseconds) when the current ga_session_id started.
    • session_number: The sequential session count for this client_id.

2. Python Identity & Session Service (Cloud Run)

This Flask application will receive client_id, user_id, and event_timestamp, perform the lookup/update logic in Firestore, and return the resolved user_id and ga_session_id.

identity-session-service/main.py example:

import os
import json
import random
import time
from flask import Flask, request, jsonify
from google.cloud import firestore
import logging
import datetime

app = Flask(__name__)
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Initialize Firestore client
try:
    db = firestore.Client()
    logger.info("Firestore client initialized.")
except Exception as e:
    logger.error(f"Error initializing Firestore client: {e}")
    # In production, decide if this should crash or return safe defaults

# Configuration for session timeout (GA4 default is 30 minutes)
SESSION_TIMEOUT_MS = int(os.environ.get('SESSION_TIMEOUT_MINUTES', '30')) * 60 * 1000

@app.route('/resolve-identity-session', methods=['POST'])
def resolve_identity_session():
    """
    Receives client_id, user_id, event_timestamp.
    Resolves consistent user_id and ga_session_id, updates state in Firestore.
    """
    if not request.is_json:
        logger.warning(f"Request is not JSON. Content-Type: {request.headers.get('Content-Type')}")
        return jsonify({'error': 'Request must be JSON'}), 400

    try:
        data = request.get_json()
        client_id = data.get('client_id')
        incoming_user_id = data.get('user_id')
        event_timestamp_ms = data.get('event_timestamp') # Expected in milliseconds

        if not client_id or not event_timestamp_ms:
            logger.error("Missing client_id or event_timestamp in request.")
            return jsonify({'error': 'Missing client_id or event_timestamp'}), 400
        
        # --- 1. User Identity Resolution ---
        user_map_ref = db.collection('user_identity_map').document(client_id)
        current_time_ms = int(time.time() * 1000)
        resolved_user_id = incoming_user_id # Start with incoming user_id

        user_map_doc = user_map_ref.get()
        if user_map_doc.exists:
            user_map_data = user_map_doc.to_dict()
            stored_user_id = user_map_data.get('user_id')
            
            if stored_user_id and not resolved_user_id:
                # If we have a stored user_id but no incoming, use the stored one
                resolved_user_id = stored_user_id
                logger.debug(f"Resolved user_id for {client_id} from storage: {resolved_user_id}")
            elif resolved_user_id and stored_user_id != resolved_user_id:
                # If incoming user_id differs from stored, update the map
                user_map_ref.update({
                    'user_id': resolved_user_id,
                    'last_updated': firestore.SERVER_TIMESTAMP
                })
                logger.info(f"Updated user_id map for {client_id}: {stored_user_id} -> {resolved_user_id}")
        elif resolved_user_id:
            # New client_id with an incoming user_id, create new map entry
            user_map_ref.set({
                'user_id': resolved_user_id,
                'first_seen_at': firestore.SERVER_TIMESTAMP,
                'last_updated': firestore.SERVER_TIMESTAMP
            })
            logger.info(f"Created new user_id map for {client_id} with user_id: {resolved_user_id}")
        
        # If no user_id at all (anonymous), use client_id as pseudo-user_id for context
        if not resolved_user_id:
            resolved_user_id = client_id 
            logger.debug(f"No user_id found, using client_id as resolved_user_id: {resolved_user_id}")


        # --- 2. Session Management ---
        session_state_ref = db.collection('session_state').document(client_id)
        session_state_doc = session_state_ref.get()

        resolved_ga_session_id = None
        session_number = 1
        
        if session_state_doc.exists:
            state_data = session_state_doc.to_dict()
            last_event_timestamp = state_data.get('last_event_timestamp', 0)
            
            if (event_timestamp_ms - last_event_timestamp) < SESSION_TIMEOUT_MS:
                # Session is still active, reuse existing ga_session_id
                resolved_ga_session_id = state_data.get('ga_session_id')
                session_number = state_data.get('session_number', 1)
                logger.debug(f"Reusing session {resolved_ga_session_id} for {client_id}. Session number: {session_number}")
            else:
                # Session timed out, start a new one
                resolved_ga_session_id = f"{client_id}.{event_timestamp_ms}.{random.randint(100, 999)}" # Mimic GA4 format
                session_number = state_data.get('session_number', 0) + 1
                session_state_ref.update({
                    'ga_session_id': resolved_ga_session_id,
                    'last_event_timestamp': event_timestamp_ms,
                    'session_start_timestamp': event_timestamp_ms,
                    'session_number': session_number
                })
                logger.info(f"New session {resolved_ga_session_id} started for {client_id}. Session number: {session_number}")
        else:
            # First event for this client_id, start a new session
            resolved_ga_session_id = f"{client_id}.{event_timestamp_ms}.{random.randint(100, 999)}" # Mimic GA4 format
            session_state_ref.set({
                'ga_session_id': resolved_ga_session_id,
                'last_event_timestamp': event_timestamp_ms,
                'session_start_timestamp': event_timestamp_ms,
                'session_number': 1
            })
            logger.info(f"First session {resolved_ga_session_id} started for new client_id {client_id}.")

        # Always update last_event_timestamp for active sessions too
        if resolved_ga_session_id: # Only update if a session ID was successfully resolved/created
             session_state_ref.update({
                'last_event_timestamp': event_timestamp_ms
            })
        
        return jsonify({
            'resolved_user_id': resolved_user_id,
            'resolved_ga_session_id': resolved_ga_session_id,
            'session_number': session_number
        }), 200

    except Exception as e:
        logger.error(f"Error during identity/session resolution for client_id {client_id}: {e}", exc_info=True)
        # On error, provide fallback values to avoid breaking GA4 tracking
        return jsonify({
            'resolved_user_id': incoming_user_id or client_id, # Fallback to incoming or client_id
            'resolved_ga_session_id': f"{client_id}.{event_timestamp_ms}.error", # Indicate error
            'session_number': 0 # Indicate error or unknown
        }), 500

if __name__ == '__main__':
    app.run(debug=True, host='0.0.0.0', port=int(os.environ.get('PORT', 8080)))

identity-session-service/requirements.txt:

Flask
google-cloud-firestore

Deploy the Python service to Cloud Run:

gcloud run deploy identity-session-service \
    --source ./identity-session-service \
    --platform managed \
    --region YOUR_GCP_REGION \
    --allow-unauthenticated \
    --set-env-vars \
        GCP_PROJECT_ID="YOUR_GCP_PROJECT_ID",\
        SESSION_TIMEOUT_MINUTES="30" \
    --memory 512Mi \
    --cpu 1 \
    --timeout 30s # Allow enough time for Firestore operations

Important:

  • Replace YOUR_GCP_PROJECT_ID and YOUR_GCP_REGION with your actual values.
  • The --allow-unauthenticated flag is for simplicity. In production, consider authenticated invocations as discussed in previous posts.
  • Ensure the Cloud Run service identity has the roles/datastore.user role (which covers Firestore read/write access) on your GCP project.
  • Note down the URL of this deployed Cloud Run service.

3. GTM Server Container Custom Variable Template

Create a custom variable template in your GTM Server Container that fires early to call the Identity & Session Service and set the resolved identifiers in eventData.

GTM SC Custom Variable Template: Identity & Session Resolver

const sendHttpRequest = require('sendHttpRequest');
const JSON = require('JSON');
const log = require('log');
const getEventData = require('getEventData');
const setInEventData = require('setInEventData');
const getRequestHeader = require('getRequestHeader'); // For _ga cookie, if not already extracted

// Configuration fields for the template:
//   - identityServiceUrl: Text input for your Cloud Run Identity & Session service URL
//   - clientIdVariable: Text input, name of the variable holding client_id (e.g., '{{Event Data - _event_metadata.client_id}}')
//   - userIdVariable: Text input, name of the variable holding user_id (e.g., '{{Event Data - user_id}}' or '{{Event Data - logged_in_user_id}}')
//   - eventTimestampVariable: Text input, name of the variable holding event timestamp in milliseconds (e.g., '{{Event Data - gtm.start}}')

const identityServiceUrl = data.identityServiceUrl;
const client_id = getEventData(data.clientIdVariable);
const user_id = getEventData(data.userIdVariable); // This will be undefined if no user_id is sent
const event_timestamp_ms = getEventData(data.eventTimestampVariable);

// Check for required inputs
if (!identityServiceUrl) {
    log('Identity & Session Service URL is not configured.', 'ERROR');
    data.gtmOnSuccess({}); // Return empty object, let downstream handle defaults
    return;
}
if (!client_id || !event_timestamp_ms) {
    log('Client ID or Event Timestamp is missing. Cannot resolve identity/session.', 'ERROR');
    // Fallback: Set some default, potentially using raw incoming values
    setInEventData('_resolved.user_id', user_id || client_id, true);
    setInEventData('_resolved.ga_session_id', 'missing_id_error', true);
    data.gtmOnSuccess(getEventData('_resolved'));
    return;
}

log(`Requesting identity/session for client ID: ${client_id.substring(0, 20)}... and user ID: ${user_id || 'anonymous'}.`, 'INFO');

const payload = {
    client_id: client_id,
    user_id: user_id, // Will be undefined if not provided
    event_timestamp: event_timestamp_ms
};

sendHttpRequest(identityServiceUrl + '/resolve-identity-session', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify(payload),
    timeout: 5000 // 5 seconds timeout for service call
}, (statusCode, headers, body) => {
    if (statusCode >= 200 && statusCode < 300) {
        try {
            const response = JSON.parse(body);
            const resolved_user_id = response.resolved_user_id;
            const resolved_ga_session_id = response.resolved_ga_session_id;
            const session_number = response.session_number;

            log(`Resolved: User ID='${resolved_user_id}', Session ID='${resolved_ga_session_id}', Session Number=${session_number}`, 'INFO');

            // Store the resolved identifiers in the event data, ephemeral for this event
            setInEventData('_resolved.user_id', resolved_user_id, true);
            setInEventData('_resolved.ga_session_id', resolved_ga_session_id, true);
            setInEventData('_resolved.session_number', session_number, true);
            data.gtmOnSuccess(getEventData('_resolved')); // Return the object
        } catch (e) {
            log('Error parsing Identity & Session service response:', e, 'ERROR');
            // Fallback: Use incoming IDs or generated defaults on error
            setInEventData('_resolved.user_id', user_id || client_id, true);
            setInEventData('_resolved.ga_session_id', 'parse_error', true);
            setInEventData('_resolved.session_number', 0, true);
            data.gtmOnSuccess(getEventData('_resolved'));
        }
    } else {
        log('Identity & Session service call failed:', statusCode, body, 'ERROR');
        // Fallback: Use incoming IDs or generated defaults on HTTP error
        setInEventData('_resolved.user_id', user_id || client_id, true);
        setInEventData('_resolved.ga_session_id', 'http_error', true);
        setInEventData('_resolved.session_number', 0, true);
        data.gtmOnSuccess(getEventData('_resolved'));
    }
});

GTM SC Configuration:

  1. Create a new Custom Variable Template named Identity & Session Resolver.
  2. Paste the code. Add permissions: Access event data, Send HTTP requests, Access request headers (if your clientIdVariable needs to read cookies directly).
  3. Create a Custom Variable (e.g., {{Resolved Identity & Session}}) using this template.
  4. Configure:
    • identityServiceUrl: The URL of your Cloud Run service (https://identity-session-service-YOUR_HASH-YOUR_REGION.a.run.app).
    • clientIdVariable: {{Event Data - _event_metadata.client_id}} (This is the most reliable way to get the client ID after the GA4 Client has processed the incoming request).
    • userIdVariable: {{Event Data - user_id}} (or whatever data layer variable you push for authenticated user IDs).
    • eventTimestampVariable: {{Event Data - gtm.start}} (GA4's default event timestamp in milliseconds).
  5. Crucially, set the trigger for this variable to Initialization - All Pages or All Events and ensure it has a very high priority (e.g., -100) in your container. This guarantees it runs as early as possible, before any other tags (GA4, Facebook CAPI, etc.) fire that might need the resolved identity/session information.

4. Using the Resolved Identifiers in Your GA4 Tag

Once the Identity & Session Resolver variable (e.g., {{Resolved Identity & Session}}) has run, the resolved user_id, ga_session_id, and session_number are available in your eventData under the _resolved namespace.

Update Your GA4 Configuration Tag in GTM SC: This tag should fire first to establish the user_id and session_id for all subsequent events.

  1. In your GA4 Configuration Tag, under "Fields to Set", add:
    • Field Name: user_id
    • Value: {{Resolved Identity & Session.user_id}}
    • Field Name: session_id
    • Value: {{Resolved Identity & Session.ga_session_id}}
  2. You can also pass session_number as a user property or event parameter for enhanced analysis:
    • Field Name: session_number
    • Value: {{Resolved Identity & Session.session_number}} (and register this as a Custom Dimension in GA4 UI with User scope).

Update Your GA4 Event Tags in GTM SC: For any other GA4 event tags (e.g., page_view, purchase):

  1. Ensure they inherit the user_id and session_id from the Configuration Tag.
  2. If you want to explicitly send session_number with every event, you can add it as an event parameter:
    • Parameter Name: session_number
    • Value: {{Resolved Identity & Session.session_number}} (and register as an Event-scoped Custom Dimension in GA4 UI).

This ensures every event sent to GA4 includes the user's consistently managed user_id and ga_session_id, allowing for accurate user journey and session analysis.

5. Leveraging in GA4 Reports and Explorations

Once the data flows into GA4 with these resolved identifiers, you can:

  • View User Journeys: Use the user_id for cross-device analysis in User Explorer and Path Explorations.
  • Accurate Session Metrics: Trust your session counts, engagement rates, and conversion rates, knowing they are based on a robust server-side session definition.
  • Custom Reporting: Create custom reports or Explorations using the session_number or other derived session metrics to analyze user behavior over multiple sessions.
  • Audiences: Build audiences based on combined anonymous and authenticated behavior, leveraging the user_id and consistent session data.

Benefits of This Server-Side Approach

  • Holistic User Journeys: Unify anonymous and authenticated data for a complete view of customer interactions across devices and time.
  • Accurate GA4 Reporting: Overcome client-side limitations to provide reliable session and user metrics, leading to more trustworthy insights.
  • Enhanced Data Quality: Consistent, server-side managed identifiers improve the overall quality and integrity of your GA4 data.
  • Resilience & Future-Proofing: Your core identity and session management logic is protected from browser changes and client-side failures.
  • Centralized Control: Manage all user identity and session rules from a single, server-controlled environment.
  • Improved Personalization & Activation: A more accurate understanding of user identity fuels more effective personalization and targeted marketing campaigns.

Important Considerations

  • Latency: Adding an extra HTTP request round trip to the Identity & Session Service will introduce some milliseconds to your initial GTM SC processing. Firestore is very fast, but monitor this closely. For most analytics use cases, the benefits outweigh this minimal added latency.
  • Cost: Firestore reads/writes and Cloud Run invocations incur costs. Monitor usage, especially for high-volume sites. Implementing basic caching (e.g., in the Cloud Run service, for client_id-user_id mappings that don't change often) can help manage costs.
  • Identity Resolution Complexity: This solution focuses on client_id (anonymous) and user_id (authenticated). True enterprise-level identity resolution can be far more complex, involving multiple identifiers and probabilistic matching. This solution provides a strong foundation.
  • PII: While user_id itself should be a non-PII identifier, ensure no raw PII is stored in Firestore without appropriate hashing or encryption.
  • GA4 Identity Space: GA4 supports User-ID as a primary identifier. When you send user_id via the Measurement Protocol, GA4 prioritizes it. If no user_id is sent, it falls back to client_id. This server-side solution ensures user_id is always available when known.
  • Monitoring: Use Cloud Monitoring to track the performance and error rates of your Identity & Session Service and Cloud Firestore. Monitor for any backlogs or failed assignments.

Conclusion

Achieving a complete, accurate, and reliable view of your customer journeys is paramount for modern analytics. By implementing server-side session management and user stitching with your GTM Server Container, a dedicated Cloud Run service, and Firestore, you transform fragmented client-side data into a unified, resilient, and actionable dataset. This advanced server-side capability empowers you to overcome browser limitations, understand the full anonymous-to-authenticated user lifecycle, and drive more informed business decisions with confidence. Embrace server-side identity and session management to unlock the full potential of your GA4 analytics.