Bot detection

Every event ClickStream ingests gets a bot verdict within milliseconds. The verdict is a composite of five independent signals, each with a distinct failure mode — so a bot trying to hide from one signal still tripping the others is the common case, not the exception.

Bot events are never dropped. They're marked with bot.isBot, bot.score, and bot.category, flow through the same ingestion path as human traffic, and show up in the dashboard's Traffic Quality view where operators can filter, export, or ignore them. You keep the data; we just label it.

Signals we use

SignalSourceWhat it catches
CF Bot Management scoreCloudflare on the incoming requestGeneric ML score from Cloudflare's model. Weighted 80/100 when available.
Named-bot UA registryIn-memory pattern matchGooglebot, Bingbot, ChatGPT, Claude, Perplexity, SEO tools, monitoring, etc.
ASN + connection typeCloudflare cf.asOrganizationHosting ASN + no JS execution = scraper.
Behavioral human confidenceSession signal accumulatorPageview rate, click diversity, form interactions, scroll depth, time on page. Low confidence = bot-like.
Stealth-bot scoreMouse entropy + cross-signal inconsistency + TLS JA4 mismatchCamoufox, stealth-puppeteer, undetected-chromedriver, antidetect browsers.

The composite bot.score is 0–100, higher = more bot-like. The dashboard's behavioralClass rollup bucket drops each visitor into one of four labels:

Bot categories

Named bots get a category matched against a curated registry:

CategoryExamples
search_crawlerGooglebot, Bingbot, DuckDuckBot, YandexBot
ai_agentChatGPT, Claude, Perplexity, CCBot, Google-Extended, FirecrawlAgent
social_previewfacebookexternalhit, Twitterbot, LinkedInBot, Slackbot
seo_toolAhrefsBot, SemrushBot, MJ12bot, DataForSeoBot
monitoringUptimeRobot, Pingdom, StatusCake
scraperScrapy, python-requests, curl, wget (generic)
scannerNmap, ZAP, Nikto, security probes
automationHeadlessChrome, Playwright, Puppeteer (naive)
stealth_botStealth detector — Camoufox, stealth-puppeteer, undetected-chromedriver, antidetect browsers
unknown_botBot-shaped behavior with no registry match

A detailed threat-model atlas — the 10 stealth tools we track, the signals they mask, and the counter-signals we prioritize — is available to Scale+ customers on request. Contact support@clickstream.com.

stealth_bot — detecting fingerprint-spoofers

Stealth tools explicitly try to land in the 30–50 bot-score band so no detector can confidently say "bot". That band itself is a signal — our stealth scorer combines three extra inputs:

The composite stealthScore caps at 100. The >= 60 threshold promotes the visitor to the stealth_bot category in the registry.

A cross-session aggregator also runs on a 5-minute sliding window:

Aggregated anomalies land in KV as session-anomalies:{clientId} and surface on the dashboard's Session Integrity tile.

How to use it

In the dashboard

Intelligence → Traffic Quality groups every visitor into one of the four behavioralClass buckets with counts + per-category rollups. Click any bucket to drill into the specific visitors. Filter by country, ASN, referrer, landing page.

Via the Signals API

import { getVisitor, isBot } from '@clickstream/signals';

const visitor = await getVisitor();
if (isBot(visitor)) {
  // AI crawler or scraper — serve structured JSON-LD, skip personalization
  return <CrawlerView />;
}

visitor.bot.category carries the specific label when known, 'unknown_bot' when generic bot signals fire without a registry match.

Via the Signals Feed (Scale+)

Subscribe to the WebSocket and filter by behavioralClass:

ws.addEventListener('message', (event) => {
  const msg = JSON.parse(event.data);
  if (msg.type !== 'event') return;
  if (msg.behavioralClass === 'bot' || msg.behavioralClass === 'likely_bot') {
    forwardToSecurityTeam(msg);
  }
});

See Signals Feed for the full subscriber pattern.

Via raw events (batch export)

Bot fields appear in the Parquet export as blob7 (device), double18 (bot_score), double19 (is_bot), plus the composite bot.category on scored events in clickstream_scores. See Event schema for the blob layout.

Accuracy posture

Never blocked, always labeled

ClickStream does not drop bot traffic at the ingestion layer. Every event reaches Analytics Engine. Bot labels let operators decide how to treat the traffic downstream:

See also