Identity resolution

ClickStream's identity graph stitches together every touchpoint for a single person across sessions and devices, using the signals you already collect plus a few hashed identifiers the SDK captures automatically.

Privacy model up front. Raw PII never hits our servers. The SDK hashes email (SHA-256 + MD5) and phone (E.164 → SHA-256) client-side before transmission. The identity graph stores only HMAC-isolated hashes under per-tenant keys. Raw values, when encrypted (form fills, IP addresses under the new /decrypt gate), are AES-256-GCM under a per-site key — accessible only via a password-re-authenticated, rate-limited, audit-logged reveal endpoint in the dashboard.

Identity signals we collect

SignalSourceHashingPrimary use
hemtracker.identify(email)SHA-256 of lowercased, trimmed emailInternal primary key
hemMd5sameMD5 of lowercased, trimmed emailThird-party identity-graph interop (LiveRamp / TTD)
hashedPhonetracker.identify({ phone })SHA-256 of E.164 normalized phoneSecondary primary key
customerIdtracker.identify({ customerId })Raw (not PII — your internal CRM id)CRM / backend join
accountIdtracker.identify({ accountId })Raw (B2B account identifier)Multi-seat B2B attribution
gclid / fbclid / msclkid / ttclid / twclid / sccid / epik / irclickid / dclid / li_fat_idURL query params on landingRawClick-to-conversion attribution
googleId / facebookId / linkedinId / appleIdSocial login callbackRaw (OAuth subject)Cross-device bridge via social login
maid + maidTypeMobile SDK bridgeRaw UUID (IDFA / GAID)Mobile-web unification
clickstreamId / referringClickstreamIdCross-site journey trackingRawNetwork-tier cross-site resolution

The first-party _cs_uid cookie is the anchor. Every identifier above hangs off that cookie until tracker.identify() is called, at which point the graph promotes the anonymous _cs_uid to a known-person record.

The API you actually write

Identify

await tracker.identify('user@example.com');

Accepts a raw email. The SDK:

  1. Normalizes (lowercase, trim).
  2. Hashes (SHA-256 for internal use, MD5 for third-party interop).
  3. Sends both hashes as an identify event on your first-party tracking domain.
  4. Persists in _cs_hem / _cs_hem_md5 local-storage slots so subsequent sessions on the same browser re-emit without asking.

Raw email never leaves the browser.

Identify with additional signals

await tracker.identify({
  email: 'user@example.com',
  phone: '+14155551234',
  customerId: 'cust_abc123',
  accountId: 'acc_xyz789',
});

Phone is normalized to E.164 (+CCCC...) before hashing. If the input isn't parseable as E.164, the SDK emits a warning and skips the phone hash without blocking the rest of the identify call.

Revoke identity

await tracker.revokeIdentity();

Clears every identity slot from local storage + cookies, emits a _consent_transition event, and tells the server to purge the person record. Paired with the consent-banner "forget me" button.

What the graph actually stitches

Given these signals arriving from different sessions, the graph merges records using a rolling set of deterministic matchers (hem, hashedPhone, customerId, social IDs) and probabilistic matchers (fingerprint, IP household, referring clickstreamId). Every merge is auditable — the graph writes a merge_audit row with the matcher type + confidence on every link.

Merge rules, in priority order:

  1. Exact hem match — same SHA-256 email hash across devices → same person record.
  2. Exact hashedPhone match — same SHA-256 phone hash → same person.
  3. Exact customerId match — same CRM id → same person.
  4. Social login bridge — same googleId / facebookId / linkedinId / appleId → same person.
  5. Fingerprint + IP household — same fingerprint_hash + same /24 IP subnet within a 48-hour window → same person, probabilistic confidence label.

Conflicts (same _cs_uid claiming two different hem values) generate a collision audit row and are NOT auto-merged. The dashboard surfaces collisions on the Identity Quality tab; operators resolve them manually.

Using the resolved identity

Via the Signals API

import { getVisitor } from '@clickstream/signals';

const visitor = await getVisitor();
if (visitor?.identity.hasIdentifiedThisSession) {
  showLoggedInView();
}

The VisitorContext.identity block carries:

Full cross-device resolution (pulling the person record from the identity graph with every linked hem / phone / CRM ID) is a Scale+ feature accessed via the dashboard Visitors view or the graph-query API.

Via the dashboard

Visitors → Your Site shows every identified visitor with their merged signals, linked sessions, and journey map. Click a visitor to see the full timeline, stored signals, and any conflict audit rows.

Third-party identity graphs

Several external identity graphs expect MD5-hashed email:

We send the MD5 hash (hemMd5) alongside the SHA-256 hash so operators with ID-graph partnerships don't need a separate pipeline. If you don't need third-party graph integration, the MD5 hash is still emitted but never leaves your tenant's Analytics Engine.

Retention

See also