Index lockfiles and possibly persist as tree (!37543) · Merge requests · Warren Gifford / sourcegraph

Warren Gifford requested to merge mrn/persist-lockfile-trees into main Jun 22, 2022

Created by: mrnugget

This is an extraction from #36481 and includes the backend/API changes necessary to switch to a model for dependency search where we actively index lockfiles in the background using code intel's policy mechanism.

It's the first of multiple PRs to implement what I demoed here.

(Edit: here are the draft PRs that add the remaining functionality: https://github.com/sourcegraph/sourcegraph/pull/37544)

In short, what this PR does:

IMPORTANT: Change dependencies.Service to NOT parse lockfiles on demand anymore.
Show search alert if a repository/commit hasn't been lockfile indexed.
Change dependencies.Service to only query the previously persisted dependencies.
Migrate the database to change codeintel_lockfiles and codeintel_lockfile_references so we can persist dependency trees (!), one per lockfile per repo/commit.
Change dependencies/internal/store to work with the new database schema, adding ability to query dependencies per lockfile, transitive only, etc.
Add lockfile_indexing_enabled to lsif_configuration_policies. @efritz: this is different from the tags-based approached we talked about, because I found it much easier to add a boolean now than to make sure I get all of the stored procedures right when migrating the existing data. I think we can still easily change this and migrate it.
Change lockfile indexer to check for lockfile_indexing_enabled in the policies (vs. indexing).
Change the code intel frontend to allow users to create lockfile-indexing policies.
Hide everything behind the codeIntelLockfileIndexingEnabled setting flag.
Introduce a DependencyGraph type to lockfiles and the shared package. This will be returned by different parsers, for example the yarn.lock parser I've built (but that's not included in this PR, see below).
Change the service/store layers to persist the graph.

What's NOT included:

The transitive:yes predicate for search. I still need to clean this up and ask Search Product on how to implement this properly.
The actual yarn.lock parser that builds a full dependency tree. I found a bug in my current implementation and it's non-trivial to fix (but fixable), so I want to get a review on this PR before diving further into the parser.

That means there's only two things from the user's perspective that change with this PR:

IMPORTANT: Repository need to be lockfile-indexed before repo:deps() will return something.
They can create a separate lockfile-indexing policy to enable that.

Since the yarn.lock-parser that produces a tree is not in this PR, no trees will be persisted yet. That will only happen with parsers supporting that. Until then, all dependencies are persisted as direct dependencies, just like before.

TODOs/trade-offs:

This is just the first iteration. I haven't done any performance optimizations. I haven't cleaned up every TODO. I plan to do that in follow-up PRs.

But since dependency search is still behind a feature flag, I think that's fine in order to make continuous progress and have reviewable PRs.

There's one big thing that we should tackle (that was already a problem before this PR): it takes a while (1m?) until a new policy is picked up by the lockfile indexer.

Test plan

New and existing tests, manual testing, CI

Index lockfiles and possibly persist as tree

TODOs/trade-offs:

Test plan

Merge request reports