12,000 Secrets Unleashed: AI Models Training on Hidden API Keys!
Common Crawl dataset secrets revealed: 12,000 valid API keys and passwords found, including AWS and MailChimp keys. LLMs may be trained on insecure code, despite efforts to filter sensitive data. Truffle Security warns developers against hardcoding secrets, highlighting risks of data leaks and phishing.

Hot Take:
Whoever said secrets are meant to be kept clearly didn’t inform the Common Crawl dataset. It’s like an open treasure chest of garbled passwords and keys, just waiting for pirates of the digital seas! Forget about hacking into mainframes, the real action is in the HTML forms and JavaScript snippets. Arr, matey, hardcoded treasures await!
Key Points:
- Close to 12,000 valid secrets found in the Common Crawl dataset.
- Truffle Security identified 219 distinct secret types, with MailChimp API keys being the most common.
- Secrets were hardcoded into HTML and JavaScript, not using server-side environment variables.
- 63% of secrets appeared across multiple pages, with one WalkScore API key found 57,029 times.
- Truffle Security contacted vendors to revoke compromised keys to prevent misuse.
Already a member? Log in here