Security researcher Sharon Brizinov earned $64,000 in bug bounties after finding hundreds of secrets leaking in dozens of public GitHub repositories.
What makes Brizinov’s findings special is that the leaked secrets were found in files that had been deleted from the scanned repositories, which also reveals risks associated with a lack of appropriate actions when dealing with such leaks.
The issue his research brings to the spotlight is that developers may not be aware that Git retains copies of all files within a repository, even if they are no longer available in the working directory.
A distributed revision control system, Git tracks content using a commit – tree – blob structure. It captures snapshots of the repository’s state, and relies on branches for reviewing commit history, and on tags for referencing specific commits.
Because it stores each version of a file as a unique object and keeps a complete history of the changes made to a repository, Git makes it easy to revert to previous states and to restore files that have been deleted or removed from the working directory and committed.
“Once a commit is created, its data is stored in .git/objects and remains there even if it’s no longer referenced by any branch or tag. Unreferenced (dangling) objects aren’t removed immediately — they’re typically retained for around two weeks before being eligible for garbage collection,” Brizinov explains.
Removing files from Git history may prove difficult, as the system maintains references to them in heads and tags, and older commits still contain files removed in newer ones.
“To completely remove a file from history, one must rewrite history using tools like git filter-branch, git-filter-repo or by manually rebasing and running garbage collector (with prune) to clear unreachable objects,” Brizinov says.
When it comes to public repositories, he notes, completely removing committed files is virtually impossible, as they may have been copied or cloned elsewhere.
To shed light on these risks, Brizinov built an automated tool to clone public repositories, traverse all commits to find deleted files, restore them, and scan them for secrets such as API keys, tokens, and credentials.
He focused on companies with active bug bounty programs and those with Github repositories that have over 5,000 stars, and discovered hundreds of active secrets, mainly in binary files that had been deleted after being committed to the repository.
In addition to platform-specific developer tokens and sessions, and email SMTP credentials, Brizinov discovered tokens for GCP projects, AWS, Slack, GitHub, OpenAPI, HuggingFace, and Algolia.
“Why did the secrets get leaked in the first place? After analyzing dozens of real-impact cases, I can summarize this question into three explanations— lack of knowledge of how Git works, not fully realizing what was committed due to binary files or hidden files, and blindly trusting Git rewrite-history tools,” the researcher notes.
He also points out that, if a secret-leaking file is committed, developers should not simply delete it, but also rotate the potentially impacted secrets, to eliminate the risk of compromise.
Related: 39 Million Secrets Leaked on GitHub in 2024
Related: Compromised SpotBugs Token Led to GitHub Actions Supply Chain Hack
Related: Hacker Stole Secrets From OpenAI
Related: GitHub Actions Artifacts Leak Tokens and Expose Cloud Services and Repositories