Edge Capture

The browser SDK measures what humans do after JavaScript runs. Edge capture measures requests that often do not run JavaScript: search crawlers, answer agents, link previews, uptime checks, security review, and browser automation.

Use both layers for complete Signals coverage:

Layer	What it measures	Best for
Browser SDK	Pageviews, clicks, forms, identify calls, behavior scores.	Human interaction and product analytics.
Edge capture	HTML requests before JavaScript runs.	AI/search coverage, preview metadata, monitoring, automation, and QA proof.

Edge capture does not replace the SDK. It labels non-human lanes and sends a first-party pageview to ClickStream from your site edge. Bot-classed Edge capture hits count toward Signals Coverage, not the human pageview allowance.

Where It Shows Up

After installation, open the dashboard and go to Intelligence -> Agent Traffic.

You should see:

AI Search Coverage: answer-agent and search-crawler events compared with the pages humans use.
Most Crawled Pages: URLs requested by bot-classed traffic.
Traffic by Purpose: rollups for answer engines, search crawlers, SEO tools, previews, monitoring, automation, and review traffic.
Named AI, Bots, and Tools: named non-human agents observed on your site.

Human-only actions remain separate. A crawler pageview should not create an application record, trigger human-only UI, or pollute identity matching.

Copy-Paste Install

Keep the browser SDK installed first:

<script
  src="https://t.example.com/sdk.js"
  data-key="cs_live_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
  async
></script>

Then add the Worker from Onboarding -> Add Code -> Cloudflare Edge Capture. The dashboard-generated snippet is the production source because it includes the current recognition rules. The compact version below shows the same contract and is safe to adapt when you want to understand the moving parts.

Replace t.example.com and the API key with your own values:

const CLICKSTREAM_ENDPOINT = 'https://t.example.com';
const CLICKSTREAM_API_KEY = 'cs_live_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx';

function classifyRequest(request) {
  const ua = request.headers.get('user-agent') || '';
  const lower = ua.toLowerCase();
  const cf = request.cf || {};
  const botScore = cf.botManagement?.score;
  const verifiedBot = cf.botManagement?.verifiedBot === true;

  if (/answer|ai|llm|crawler|bot/i.test(ua) && /answer|ai|llm/i.test(ua)) {
    return { trafficClass: 'bot', purpose: 'answer_engine' };
  }
  if (/search|spider|crawler/i.test(ua)) {
    return { trafficClass: 'bot', purpose: 'search_crawler' };
  }
  if (/preview|unfurl|embed/i.test(ua)) {
    return { trafficClass: 'bot', purpose: 'link_preview' };
  }
  if (/uptime|monitor|synthetic|healthcheck/i.test(ua)) {
    return { trafficClass: 'bot', purpose: 'site_monitoring' };
  }
  if (/scan|probe|audit|curl|wget|python-requests|go-http-client|node-fetch/i.test(ua)) {
    return { trafficClass: 'bot', purpose: 'security_research' };
  }
  if (/headless|webdriver|browser-control/i.test(ua)) {
    return { trafficClass: 'bot', purpose: 'automation' };
  }
  if (
    verifiedBot ||
    (typeof botScore === 'number' && botScore <= 30) ||
    lower.includes('bot') ||
    lower.includes('crawler') ||
    lower.includes('spider')
  ) {
    return { trafficClass: 'bot', purpose: 'crawler' };
  }

  return { trafficClass: 'human', purpose: 'human_experience' };
}

function shouldCaptureAtEdge(request, segment) {
  if (request.method !== 'GET' && request.method !== 'HEAD') return false;
  const accept = request.headers.get('accept') || '';
  if (!accept.includes('text/html') && accept !== '*/*') return false;
  return segment.trafficClass === 'bot';
}

function edgePayload(request) {
  const cf = request.cf || {};
  return {
    url: request.url,
    referrer: request.headers.get('referer') || undefined,
    userAgent: request.headers.get('user-agent') || '',
    ip: request.headers.get('cf-connecting-ip') || undefined,
    mode: 'bots_only',
    cf: {
      country: cf.country,
      city: cf.city,
      region: cf.region,
      regionCode: cf.regionCode,
      postalCode: cf.postalCode,
      continent: cf.continent,
      timezone: cf.timezone,
      metroCode: cf.metroCode,
      latitude: cf.latitude,
      longitude: cf.longitude,
      asn: cf.asn,
      asOrganization: cf.asOrganization,
      botManagement: cf.botManagement ? {
        score: cf.botManagement.score,
        verifiedBot: cf.botManagement.verifiedBot,
        ja3Hash: cf.botManagement.ja3Hash,
        ja4: cf.botManagement.ja4,
      } : undefined,
    },
  };
}

function addMachineReadableHints(html, segment, request) {
  if (segment.purpose !== 'answer_engine' && segment.purpose !== 'search_crawler') return html;
  const cf = request.cf || {};
  const geo = [cf.country, cf.region, cf.city].filter(Boolean).join('-');
  const hints = [
    `<meta name="clickstream-traffic-purpose" content="${segment.purpose}">`,
    geo ? `<meta name="clickstream-geo" content="${geo.replace(/"/g, '')}">` : '',
    '<meta name="robots" content="index,follow,max-snippet:-1,max-image-preview:large,max-video-preview:-1">',
  ].filter(Boolean).join('\n');

  return html.includes('</head>') ? html.replace('</head>', `${hints}\n</head>`) : html;
}

async function fetchOrigin(request, env) {
  if (env?.ASSETS?.fetch) return env.ASSETS.fetch(request);
  return fetch(request);
}

async function sendEdgeCapture(request) {
  const origin = new URL(request.url).origin;
  return fetch(`${CLICKSTREAM_ENDPOINT}/v1/edge/pageview?key=${encodeURIComponent(CLICKSTREAM_API_KEY)}`, {
    method: 'POST',
    headers: {
      'content-type': 'application/json',
      'user-agent': 'ClickStream-Edge-Capture/1.0',
      origin,
      referer: request.url,
    },
    body: JSON.stringify(edgePayload(request)),
  });
}

export default {
  async fetch(request, env, ctx) {
    const segment = classifyRequest(request);
    const capturePromise = shouldCaptureAtEdge(request, segment)
      ? sendEdgeCapture(request)
      : undefined;

    if (capturePromise && ctx?.waitUntil) {
      ctx.waitUntil(capturePromise.catch(() => undefined));
    }

    const response = await fetchOrigin(request, env);

    const headers = new Headers(response.headers);
    headers.set('x-clickstream-traffic-purpose', segment.purpose);
    if (capturePromise) {
      headers.set('x-clickstream-edge-capture', 'queued');
    }
    headers.append('vary', 'User-Agent');

    const contentType = headers.get('content-type') || '';
    if (segment.trafficClass === 'bot' && contentType.includes('text/html')) {
      const html = await response.text();
      return new Response(addMachineReadableHints(html, segment, request), {
        status: response.status,
        statusText: response.statusText,
        headers,
      });
    }

    return new Response(response.body, {
      status: response.status,
      statusText: response.statusText,
      headers,
    });
  },
};

What The Worker Must Do

Capture only HTML requests that are bot-classed or crawler-classed.
Preserve the original user agent in the JSON payload.
Send the capture subrequest as ClickStream-Edge-Capture/1.0.
Include Origin and Referer from your site so the collector can verify the request belongs to an allowed domain.
Use env.ASSETS.fetch(request) on Cloudflare Pages, or fetch(request) in a Worker in front of another origin.

Verify

Replace the hostname and key with your own.

curl -sS -D - -o /tmp/cs-edge-test.html \
  -A 'curl/8.0; security-review' \
  -H 'Accept: text/html' \
  'https://example.com/?qa=edge-review'

Expected response headers:

x-clickstream-traffic-purpose: security_research
x-clickstream-edge-capture: queued

Verify the collector directly:

curl -sS -X POST 'https://t.example.com/v1/edge/pageview?key=cs_live_xxx' \
  -H 'content-type: application/json' \
  -H 'origin: https://example.com' \
  -H 'referer: https://example.com/' \
  --data '{
    "url": "https://example.com/manual-edge-test",
    "userAgent": "curl/8.0; security-review",
    "cf": { "country": "US" }
  }'

Expected JSON:

{
  "success": true,
  "accepted": 1,
  "classification": {
    "trafficClass": "bot",
    "purpose": "security_research"
  }
}

A normal browser request returns accepted: 0 with reason: "browser_sdk_preferred" unless you explicitly send mode: "all". That is intentional: the browser SDK owns human interaction detail.

Example Test Context

For Signals realtime tests that need a concrete production-like site, use example.com as the customer site and https://t.example.com as the first-party tracking endpoint:

curl -sS -X POST 'https://t.example.com/v1/edge/pageview?key=cs_live_xxx' \
  -H 'content-type: application/json' \
  -H 'origin: https://example.com' \
  -H 'referer: https://example.com/' \
  --data '{
    "url": "https://example.com/signals-realtime-test",
    "userAgent": "ClickStream-Docs-Test/1.0; automation",
    "mode": "bots_only"
  }'

This pattern is for docs/examples and implementation tests. The per-visitor stream, immediate SDK flush helpers, and effects helpers are production APIs; tests should still assert fail-open behavior when credentials, billing coverage, or WebSockets are unavailable.

Purpose Lanes

Purpose	What it means	Suggested lane
`answer_engine`	Answer agents reading pages for summaries, recommendations, or retrieval.	`answer_engine_accessibility`
`search_crawler`	Search indexing crawlers.	`search_accessibility`
`seo_analysis`	SEO analysis crawlers and audit tools.	`search_accessibility`
`link_preview`	Link-card and social preview renderers.	`preview_accessibility`
`site_monitoring`	Uptime, synthetic monitoring, and health checks.	`site_health`
`security_research`	Scanner-like, scraper-like, or probing traffic.	`security_review`
`automation`	Browser automation, QA, and scripted review runs.	`automation_review`
`crawler`	Generic bot traffic that does not fit a more specific purpose yet.	`automation_review`

Collector Payload

Edge capture posts to your first-party tracking domain:

POST https://t.example.com/v1/edge/pageview?key=cs_live_xxx

Required fields:

Field	Description
`url`	Full URL requested on the customer site.
`userAgent`	Original inbound user agent from the crawler, preview agent, monitor, automation, or scanner.

Recommended fields:

Field	Description
`referrer`	Original `Referer` header when present.
`ip`	`cf-connecting-ip` when available. The collector stores privacy-safe hashes and geo fields.
`mode`	`bots_only` by default. Use `all` only for controlled diagnostics; the browser SDK should own human behavior.
`cf.country`, `cf.region`, `cf.city`	Geography used for coverage and accessibility decisions.
`cf.asn`, `cf.asOrganization`	Network context for agents and crawlers.
`cf.botManagement.*`	Optional context fields. ClickStream only trusts bot-management verdicts attached by Cloudflare to the collector request itself; caller-supplied bot scores are never treated as authoritative.