SanitAI — Scan LLM Conversation History for Leaked API Keys & Credentials

Name: SanitAI
Author: Pixelabs

See it run

One command. Every secret. Nothing sent.

Meet Nix — SanitAI's built-in scanner. Auto-discovers your local Claude Code, Claude Desktop, and Cursor session files, flags every finding, touches no network.

$ sanitai scan
SanitAI v0.1.0 — local scan, no network
Auto-discovered: 3 files · 1,842 turns · ~/.claude/projects/
FINDINGS (4)
─────────────────────────────────────────────────────────────
[HIGH ] ~/.claude/projects/myproject/session.jsonl turn=47 aws_access_key_id bytes=312..332
[HIGH ] ~/.claude/projects/apiwork/chat.jsonl turn=12 generic_api_key bytes=88..110
[MEDIUM] ~/.claude/projects/work/session.jsonl turn=91 email_address bytes=201..220
[LOW ] ~/.claude/projects/infra/debug.jsonl turn=3 internal_hostname bytes=44..60
─────────────────────────────────────────────────────────────
4 findings · 1,842 turns · 0.8s · zero network calls
Run `sanitai redact` to write a clean copy.

AWS GCP Azure Stripe Twilio Anthropic OpenAI GitHub Database URLs PEM private keys Bearer tokens Email addresses Phone numbers Custom YAML rules

Workflow

Scan in under a minute.

Install

Install via Homebrew, cargo, or the install script. SanitAI runs entirely on your machine — no account, no sign-up, no network calls at runtime.

Scan

sanitai scan auto-discovers Claude Code, Claude Desktop, and Cursor session files on your machine, runs every detector, and surfaces findings with file, turn, and severity. Nothing is modified.

Redact

sanitai redact writes a clean copy alongside the original. Rotate the credentials that surfaced. Done.

The pattern

Files no one thought of as credentials.

LLM chat exports are JSON files. They happen to contain everything you typed into the AI assistant: API keys, database strings, internal hostnames, authentication tokens, proprietary code. You do not think of them as credential stores when you create them. Attackers do not need you to.

Found 3. Again.

Nix the Raccoon mascot — SanitAI secret scanner, dry look after finding leaked credentials in an LLM export

CircleCI

January 2023

Malware on a CI engineer's laptop stole browser session cookies stored as plaintext on disk. The attacker bypassed 2FA entirely and reached CircleCI's production databases — exfiltrating every customer API key, environment variable, and token in the system. Every CircleCI customer was told to assume their secrets compromised and rotate immediately. The laptop was the breach surface.

endpoint compromise

Okta Support

October 2023

Customers uploaded HAR files — browser diagnostic exports — to Okta's support system. HAR files capture every HTTP header and cookie, including live session tokens. An attacker accessed the support archive and immediately used extracted tokens to impersonate users at 1Password, Cloudflare, and BeyondTrust. 134 enterprise customers affected. The files were routine diagnostic exports. Nobody thought of them as credential stores.

export file as attack vector

Samsung / AI Chat

March 2023

Within three weeks of lifting its internal AI tool ban, three Samsung semiconductor engineers submitted proprietary source code, equipment control scripts, and a full internal meeting recording to an AI chat assistant. Samsung could not retrieve the data — it had left the corporate perimeter. The company subsequently banned all generative AI tools company-wide. No external attacker was required.

AI chat as data exfiltration surface

The LastPass Pattern

2022

Attackers compromised a DevOps engineer's personal machine through a vulnerability in Plex Media Server running on his home computer. The machine was the entire attack surface. The payload was credentials stored locally — an encrypted vault whose keys were also on the same endpoint. This is the dominant modern breach chain: not a server exploit, but a local file on an endpoint that no one thought to protect. AI chat exports are the newest member of that file category.

local file, full breach

Okta / LAPSUS$

2022

LAPSUS$ breached Sitel, Okta's support contractor, through a support engineer's workstation. On that local machine they found a spreadsheet of domain administrator credentials. Five days of dwell time on the endpoint. 366 Okta enterprise customers affected. The file that enabled it wasn't in a server or a database — it was in a Downloads folder on someone's work laptop.

credentials in Downloads

Toyota

2017 – 2022

A Toyota subcontractor committed source code containing a live API key to a public GitHub repository in December 2017. The key granted access to a customer data server. It remained unrotated and active for five years. Toyota cannot confirm whether the 296,019 affected customers' data was accessed — because the access logging wasn't there to tell them. Five years of silent exposure, zero alerts.

5 years, zero alerts

39 million secrets leaked on GitHub in 2024 alone. AI chat exports are the new unguarded surface. — GitHub Security Blog, 2025

Compatibility

Supported Sources

Source	Format	Status
Claude Code	JSONL session files (`~/.claude/projects/`)	Supported
Claude Desktop	JSON conversation files (auto-discovered)	Supported
Cursor	SQLite workspace storage (`.vscdb`)	Supported
GitHub Copilot	Chat history	Planned
Google Gemini	Conversation history	Planned

Verification

Trust Signals

✓ Zero network calls at runtime — verify with strace or fs_usage
✓ Signed binaries via cosign/Sigstore
✓ MIT licensed — read every line on GitHub
✓ No telemetry — not opt-out, architecturally absent
✓ Original file never modified by default
✓ seccomp-bpf sandbox blocks network syscalls on Linux

Get started

Install SanitAI

# Homebrew
brew install thepixelabs/tap/sanitai

# curl
curl -fsSL https://releases.sanitai.dev/install.sh | sh

# cargo
cargo install sanitai

Verify binary signature with cosign

cosign verify-blob \
  --certificate-identity https://github.com/thepixelabs/sanitai/.github/workflows/release.yml@refs/heads/main \
  --certificate-oidc-issuer https://token.actions.githubusercontent.com \
  --bundle sanitai.bundle \
  sanitai

Nix the Raccoon mascot — SanitAI install complete, scan your LLM exports for leaked secrets and API keys

Scan before you share.
Sleep after you do.

What it does

Built for the command line.

Local, offline, and deliberately narrow in scope — SanitAI does one job and does it without asking for an account.

Secret Detection

Scans Claude.ai and OpenAI exports for 8+ categories of leaked credentials — API keys, tokens, passwords, and PII — using pattern matching and entropy analysis. Nothing leaves your machine.

Redact on Export

Produces a clean copy of your chat export with findings masked or removed. The original file is never modified.

Custom Rule Engine

Write your own detection rules in YAML — define patterns, keywords, entropy thresholds, and severity levels. Ship a .sanitai/rules.yaml with your project to enforce team-wide hygiene.

Pipe Mode

Reads from stdin, writes to stdout. Drop it into any CI/CD pipeline with no config required — cat export.json | sanitai scan is a complete workflow.

Multi-format Support

Claude.ai and OpenAI Chat exports supported today. GitHub Copilot and Google Gemini exports on the roadmap. One command works the same regardless of which AI tool generated the file.

More coming

Altergo Integration

Works natively with Altergo — the multi-account Claude Code manager. Scan across all your workspaces in one pass. No extra configuration.

Coming soon

For the audit file

Compliance Notes

GDPR Art. 32

Local processing — no Art. 28 agreement needed

SanitAI processes data exclusively on the local filesystem. No personal data crosses a network boundary. No data processor agreement under GDPR Art. 28 is required because no sub-processor is involved.

SOC 2 CC6.1 / CC6.6

Auditable scan output as detective control evidence

Scan reports can be retained as evidence of detective controls for logical access restrictions. JSON output format integrates with SIEM and GRC platforms for automated evidence collection.

Questions

The questions you'd ask Nix.

Does SanitAI send my conversations to any server?

No. SanitAI runs entirely on your local machine. Your conversation exports never leave your filesystem. There is no telemetry, no analytics, no network calls of any kind at runtime. Verify with strace on Linux or fs_usage on macOS.

What types of secrets can SanitAI detect?

SanitAI detects API keys for AWS, GCP, Azure, Stripe, Twilio, OpenAI, Anthropic, and GitHub; database connection strings (PostgreSQL, MySQL, MongoDB); PEM private keys and SSH keys; bearer tokens; and PII including email addresses and phone numbers. Custom YAML rules let you extend detection for your own patterns.

Which LLM conversation sources are supported?

SanitAI auto-discovers Claude Code session files (~/.claude/projects/**/*.jsonl), Claude Desktop conversation JSON files, and Cursor workspace SQLite databases. You can also pass any file or directory explicitly (sanitai scan <path>), or pipe content in via stdin. No export or download step required. GitHub Copilot and Google Gemini support are planned.

Can I run SanitAI in a CI/CD pipeline?

Yes. SanitAI exits with code 1 when findings are present, making it compatible with any CI/CD system. Use --format json for machine-readable output and --exit-zero for non-blocking canary scans.

How do I suppress false positives?

Disable specific detectors in ~/.config/sanitai/config.toml using disable_detectors = ['email_address']. Custom rules support context_keywords to narrow pattern matching and reduce false positives for your environment.

Your AI chat historyis a credential store.