Edge Capture

The browser SDK measures what humans do after JavaScript runs. Edge capture measures requests that often do not run JavaScript: search crawlers, answer agents, link previews, uptime checks, security review, and browser automation.

Use both layers for complete Signals coverage:

LayerWhat it measuresBest for
Browser SDKPageviews, clicks, forms, identify calls, behavior scores.Human interaction and product analytics.
Edge captureHTML requests before JavaScript runs.AI/search coverage, preview metadata, monitoring, automation, and QA proof.

Edge capture does not replace the SDK. It labels non-human lanes and sends a first-party pageview to ClickStream from your site edge. Bot-classed Edge capture hits count toward Signals Coverage, not the human pageview allowance.

Where It Shows Up

After installation, open the dashboard and go to Intelligence -> Agent Traffic.

You should see:

Human-only actions remain separate. A crawler pageview should not create an application record, trigger human-only UI, or pollute identity matching.

Copy-Paste Install

Keep the browser SDK installed first:

<script
  src="https://t.example.com/sdk.js"
  data-key="cs_live_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
  async
></script>

Then add the Worker from Onboarding -> Add Code -> Cloudflare Edge Capture. The dashboard-generated snippet is the production source because it includes the current recognition rules. The compact version below shows the same contract and is safe to adapt when you want to understand the moving parts.

Replace t.example.com and the API key with your own values:

const CLICKSTREAM_ENDPOINT = 'https://t.example.com';
const CLICKSTREAM_API_KEY = 'cs_live_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx';

function classifyRequest(request) {
  const ua = request.headers.get('user-agent') || '';
  const lower = ua.toLowerCase();
  const cf = request.cf || {};
  const botScore = cf.botManagement?.score;
  const verifiedBot = cf.botManagement?.verifiedBot === true;

  if (/answer|ai|llm|crawler|bot/i.test(ua) && /answer|ai|llm/i.test(ua)) {
    return { trafficClass: 'bot', purpose: 'answer_engine' };
  }
  if (/search|spider|crawler/i.test(ua)) {
    return { trafficClass: 'bot', purpose: 'search_crawler' };
  }
  if (/preview|unfurl|embed/i.test(ua)) {
    return { trafficClass: 'bot', purpose: 'link_preview' };
  }
  if (/uptime|monitor|synthetic|healthcheck/i.test(ua)) {
    return { trafficClass: 'bot', purpose: 'site_monitoring' };
  }
  if (/scan|probe|audit|curl|wget|python-requests|go-http-client|node-fetch/i.test(ua)) {
    return { trafficClass: 'bot', purpose: 'security_research' };
  }
  if (/headless|webdriver|browser-control/i.test(ua)) {
    return { trafficClass: 'bot', purpose: 'automation' };
  }
  if (
    verifiedBot ||
    (typeof botScore === 'number' && botScore <= 30) ||
    lower.includes('bot') ||
    lower.includes('crawler') ||
    lower.includes('spider')
  ) {
    return { trafficClass: 'bot', purpose: 'crawler' };
  }

  return { trafficClass: 'human', purpose: 'human_experience' };
}

function shouldCaptureAtEdge(request, segment) {
  if (request.method !== 'GET' && request.method !== 'HEAD') return false;
  const accept = request.headers.get('accept') || '';
  if (!accept.includes('text/html') && accept !== '*/*') return false;
  return segment.trafficClass === 'bot';
}

function edgePayload(request) {
  const cf = request.cf || {};
  return {
    url: request.url,
    referrer: request.headers.get('referer') || undefined,
    userAgent: request.headers.get('user-agent') || '',
    ip: request.headers.get('cf-connecting-ip') || undefined,
    mode: 'bots_only',
    cf: {
      country: cf.country,
      city: cf.city,
      region: cf.region,
      regionCode: cf.regionCode,
      postalCode: cf.postalCode,
      continent: cf.continent,
      timezone: cf.timezone,
      metroCode: cf.metroCode,
      latitude: cf.latitude,
      longitude: cf.longitude,
      asn: cf.asn,
      asOrganization: cf.asOrganization,
      botManagement: cf.botManagement ? {
        score: cf.botManagement.score,
        verifiedBot: cf.botManagement.verifiedBot,
        ja3Hash: cf.botManagement.ja3Hash,
        ja4: cf.botManagement.ja4,
      } : undefined,
    },
  };
}

function addMachineReadableHints(html, segment, request) {
  if (segment.purpose !== 'answer_engine' && segment.purpose !== 'search_crawler') return html;
  const cf = request.cf || {};
  const geo = [cf.country, cf.region, cf.city].filter(Boolean).join('-');
  const hints = [
    `<meta name="clickstream-traffic-purpose" content="${segment.purpose}">`,
    geo ? `<meta name="clickstream-geo" content="${geo.replace(/"/g, '')}">` : '',
    '<meta name="robots" content="index,follow,max-snippet:-1,max-image-preview:large,max-video-preview:-1">',
  ].filter(Boolean).join('\n');

  return html.includes('</head>') ? html.replace('</head>', `${hints}\n</head>`) : html;
}

async function fetchOrigin(request, env) {
  if (env?.ASSETS?.fetch) return env.ASSETS.fetch(request);
  return fetch(request);
}

async function sendEdgeCapture(request) {
  const origin = new URL(request.url).origin;
  return fetch(`${CLICKSTREAM_ENDPOINT}/v1/edge/pageview?key=${encodeURIComponent(CLICKSTREAM_API_KEY)}`, {
    method: 'POST',
    headers: {
      'content-type': 'application/json',
      'user-agent': 'ClickStream-Edge-Capture/1.0',
      origin,
      referer: request.url,
    },
    body: JSON.stringify(edgePayload(request)),
  });
}

export default {
  async fetch(request, env, ctx) {
    const segment = classifyRequest(request);
    const capturePromise = shouldCaptureAtEdge(request, segment)
      ? sendEdgeCapture(request)
      : undefined;

    if (capturePromise && ctx?.waitUntil) {
      ctx.waitUntil(capturePromise.catch(() => undefined));
    }

    const response = await fetchOrigin(request, env);

    const headers = new Headers(response.headers);
    headers.set('x-clickstream-traffic-purpose', segment.purpose);
    if (capturePromise) {
      headers.set('x-clickstream-edge-capture', 'queued');
    }
    headers.append('vary', 'User-Agent');

    const contentType = headers.get('content-type') || '';
    if (segment.trafficClass === 'bot' && contentType.includes('text/html')) {
      const html = await response.text();
      return new Response(addMachineReadableHints(html, segment, request), {
        status: response.status,
        statusText: response.statusText,
        headers,
      });
    }

    return new Response(response.body, {
      status: response.status,
      statusText: response.statusText,
      headers,
    });
  },
};

What The Worker Must Do

Verify

Replace the hostname and key with your own.

curl -sS -D - -o /tmp/cs-edge-test.html \
  -A 'curl/8.0; security-review' \
  -H 'Accept: text/html' \
  'https://example.com/?qa=edge-review'

Expected response headers:

x-clickstream-traffic-purpose: security_research
x-clickstream-edge-capture: queued

Verify the collector directly:

curl -sS -X POST 'https://t.example.com/v1/edge/pageview?key=cs_live_xxx' \
  -H 'content-type: application/json' \
  -H 'origin: https://example.com' \
  -H 'referer: https://example.com/' \
  --data '{
    "url": "https://example.com/manual-edge-test",
    "userAgent": "curl/8.0; security-review",
    "cf": { "country": "US" }
  }'

Expected JSON:

{
  "success": true,
  "accepted": 1,
  "classification": {
    "trafficClass": "bot",
    "purpose": "security_research"
  }
}

A normal browser request returns accepted: 0 with reason: "browser_sdk_preferred" unless you explicitly send mode: "all". That is intentional: the browser SDK owns human interaction detail.

Example Test Context

For Signals realtime tests that need a concrete production-like site, use example.com as the customer site and https://t.example.com as the first-party tracking endpoint:

curl -sS -X POST 'https://t.example.com/v1/edge/pageview?key=cs_live_xxx' \
  -H 'content-type: application/json' \
  -H 'origin: https://example.com' \
  -H 'referer: https://example.com/' \
  --data '{
    "url": "https://example.com/signals-realtime-test",
    "userAgent": "ClickStream-Docs-Test/1.0; automation",
    "mode": "bots_only"
  }'

This pattern is for docs/examples and implementation tests. The per-visitor stream, immediate SDK flush helpers, and effects helpers are production APIs; tests should still assert fail-open behavior when credentials, billing coverage, or WebSockets are unavailable.

Purpose Lanes

PurposeWhat it meansSuggested lane
answer_engineAnswer agents reading pages for summaries, recommendations, or retrieval.answer_engine_accessibility
search_crawlerSearch indexing crawlers.search_accessibility
seo_analysisSEO analysis crawlers and audit tools.search_accessibility
link_previewLink-card and social preview renderers.preview_accessibility
site_monitoringUptime, synthetic monitoring, and health checks.site_health
security_researchScanner-like, scraper-like, or probing traffic.security_review
automationBrowser automation, QA, and scripted review runs.automation_review
crawlerGeneric bot traffic that does not fit a more specific purpose yet.automation_review

Collector Payload

Edge capture posts to your first-party tracking domain:

POST https://t.example.com/v1/edge/pageview?key=cs_live_xxx

Required fields:

FieldDescription
urlFull URL requested on the customer site.
userAgentOriginal inbound user agent from the crawler, preview agent, monitor, automation, or scanner.

Recommended fields:

FieldDescription
referrerOriginal Referer header when present.
ipcf-connecting-ip when available. The collector stores privacy-safe hashes and geo fields.
modebots_only by default. Use all only for controlled diagnostics; the browser SDK should own human behavior.
cf.country, cf.region, cf.cityGeography used for coverage and accessibility decisions.
cf.asn, cf.asOrganizationNetwork context for agents and crawlers.
cf.botManagement.*Optional context fields. ClickStream only trusts bot-management verdicts attached by Cloudflare to the collector request itself; caller-supplied bot scores are never treated as authoritative.

See Also