Identity resolution
ClickStream's identity graph stitches together every touchpoint for a single person across sessions and devices, using the signals you already collect plus a few hashed identifiers the SDK captures automatically.
Privacy model up front. Raw PII never hits our servers. The SDK hashes email (SHA-256 + MD5) and phone (E.164 → SHA-256) client-side before transmission. The identity graph stores only HMAC-isolated hashes under per-tenant keys. Raw values, when encrypted (form fills, IP addresses under the new
/decryptgate), are AES-256-GCM under a per-site key — accessible only via a password-re-authenticated, rate-limited, audit-logged reveal endpoint in the dashboard.
Identity signals we collect
| Signal | Source | Hashing | Primary use |
|---|---|---|---|
hem | tracker.identify(email) | SHA-256 of lowercased, trimmed email | Internal primary key |
hemMd5 | same | MD5 of lowercased, trimmed email | Third-party identity-graph interop (LiveRamp / TTD) |
hashedPhone | tracker.identify({ phone }) | SHA-256 of E.164 normalized phone | Secondary primary key |
customerId | tracker.identify({ customerId }) | Raw (not PII — your internal CRM id) | CRM / backend join |
accountId | tracker.identify({ accountId }) | Raw (B2B account identifier) | Multi-seat B2B attribution |
gclid / fbclid / msclkid / ttclid / twclid / sccid / epik / irclickid / dclid / li_fat_id | URL query params on landing | Raw | Click-to-conversion attribution |
googleId / facebookId / linkedinId / appleId | Social login callback | Raw (OAuth subject) | Cross-device bridge via social login |
maid + maidType | Mobile SDK bridge | Raw UUID (IDFA / GAID) | Mobile-web unification |
clickstreamId / referringClickstreamId | Cross-site journey tracking | Raw | Network-tier cross-site resolution |
The first-party _cs_uid cookie is the anchor. Every identifier above hangs off that cookie until tracker.identify() is called, at which point the graph promotes the anonymous _cs_uid to a known-person record.
The API you actually write
Identify
await tracker.identify('user@example.com');
Accepts a raw email. The SDK:
- Normalizes (lowercase, trim).
- Hashes (SHA-256 for internal use, MD5 for third-party interop).
- Sends both hashes as an
identifyevent on your first-party tracking domain. - Persists in
_cs_hem/_cs_hem_md5local-storage slots so subsequent sessions on the same browser re-emit without asking.
Raw email never leaves the browser.
Identify with additional signals
await tracker.identify({
email: 'user@example.com',
phone: '+14155551234',
customerId: 'cust_abc123',
accountId: 'acc_xyz789',
});
Phone is normalized to E.164 (+CCCC...) before hashing. If the input isn't parseable as E.164, the SDK emits a warning and skips the phone hash without blocking the rest of the identify call.
Revoke identity
await tracker.revokeIdentity();
Clears every identity slot from local storage + cookies, emits a _consent_transition event, and tells the server to purge the person record. Paired with the consent-banner "forget me" button.
What the graph actually stitches
Given these signals arriving from different sessions, the graph merges records using a rolling set of deterministic matchers (hem, hashedPhone, customerId, social IDs) and probabilistic matchers (fingerprint, IP household, referring clickstreamId). Every merge is auditable — the graph writes a merge_audit row with the matcher type + confidence on every link.
Merge rules, in priority order:
- Exact
hemmatch — same SHA-256 email hash across devices → same person record. - Exact
hashedPhonematch — same SHA-256 phone hash → same person. - Exact
customerIdmatch — same CRM id → same person. - Social login bridge — same
googleId/facebookId/linkedinId/appleId→ same person. - Fingerprint + IP household — same
fingerprint_hash+ same/24IP subnet within a 48-hour window → same person,probabilisticconfidence label.
Conflicts (same _cs_uid claiming two different hem values) generate a collision audit row and are NOT auto-merged. The dashboard surfaces collisions on the Identity Quality tab; operators resolve them manually.
Using the resolved identity
Via the Signals API
import { getVisitor } from '@clickstream/signals';
const visitor = await getVisitor();
if (visitor?.identity.hasIdentifiedThisSession) {
showLoggedInView();
}
The VisitorContext.identity block carries:
status—'anonymous'or'signal_identified'.clickstreamId— the first-party_cs_uidvalue.isReturning— whether this person has visited before.hasIdentifiedThisSession— whethertracker.identify()fired this session.
Full cross-device resolution (pulling the person record from the identity graph with every linked hem / phone / CRM ID) is a Scale+ feature accessed via the dashboard Visitors view or the graph-query API.
Via the dashboard
Visitors → Your Site shows every identified visitor with their merged signals, linked sessions, and journey map. Click a visitor to see the full timeline, stored signals, and any conflict audit rows.
Third-party identity graphs
Several external identity graphs expect MD5-hashed email:
- LiveRamp RampID — MD5 is the primary ingestion format.
- The Trade Desk UID2 — SHA-256 (raw, client-side). UID2 currently routes through the public operator; a private-operator mode (AWS Nitro Enclave) is on the enterprise roadmap for tenants with strict data-sovereignty requirements — contact sales@clickstream.com.
- ID5 — MD5 or SHA-256.
We send the MD5 hash (hemMd5) alongside the SHA-256 hash so operators with ID-graph partnerships don't need a separate pipeline. If you don't need third-party graph integration, the MD5 hash is still emitted but never leaves your tenant's Analytics Engine.
Retention
- Identified person records (identity graph D1) — retained for 90 days after last signal update, then auto-scrubbed (
email,phonehashes nulled; session data preserved anonymized). - Hashed identifiers in Analytics Engine — retained per your tenant's event retention (default: indefinite; configurable per plan).
- Encrypted raw values (form fills, IP addresses) — retained per your tenant's
retentionDayssetting. Auto-purge is opt-in. - Reveal audit log — every
/decryptcall is logged with the operator identity + IP + timestamp. Audit rows are retained 7 years by default (SOC2-style compliance window).
See also
- Privacy & compliance — consent model + GDPR / CCPA / HIPAA mode
- Security — encryption at rest, per-tenant key isolation,
/decryptgate - Event schema — identify — full event payload