Rare Not Random: Using Token Efficiency for Secrets Scanning
In Regex is (almost) All You Need, we learned that using a combination of regular expression patterns, entropy, and rule-based filters are an effective way to detect candidate secrets. Regex is used for casting a wide net to identify candidates. Entropy is used as a primary filter on the captured candidates and additional filters like presence of commonly used english words, or filtering on known “safe” files like go.sum are applied last.