a8n: Campaign type to remove NPM credentials
Created by: mrnugget
This is the first campaign described in RFC 73: Campaigns to find and remove credentials
The JSON arguments for this new campaign type would look like this:
`{"matchers": [{"type": "npm"}], "scopeQuery": "repo:foo"}`
Default value:
{"matchers": [{"type": "npm"}]}
A campaign of this type should then find tokens that match this regex:
It then creates a diff where the token is removed (e.g., foo:_auth=mytoken
on a line becomes foo:_auth=
).
Thoughts on implementation (copying my comments from the Google Doc here):
We could "just" search for the regex, find the matching files and create the diffs in memory.
Update on this: it's probably a tad harder than that, because our current architecture doesn't allow passing on line-specific search results. Instead, CampaignJobs run on repositories. So, in order to allow what I proposed above (search for regex, group results by repo, for each result in repo, construct diff in memory) we'd need to change the architecture.
But what we could do, without changing architecture, is to just search twice: once to get a list of repos and then each job searches per repo again to get the specific location of the token.
Why doesn't "our current architecture [...] allow passing on line-specific search results"?
Look at this snippet of code:
That is how we go from "scopeQuery + campaign-specific search query" to a list of repositories. For each repository, we create a campaign job, with repo_id, rev, base ref.
Later, when the jobs run, there's currently no way to get back to the line-specific search results. Because here we convert search results into "repositories":
In order to pass specific results to campaign jobs, we need to persist those somewhere and I think that's non-trivial.
Again, for now we could just search twice: search for a list of repositories once and then for each repository, each job searches again for the specific tokens (with the
repo:<name of repo>
filter in the query).