Why does Sourcegraph have both search-based and precise code navigation?

Created by: olafurpg

Proposal

The difference between "search-based" and "precise" code navigation is a recurring confusion both internally and externally. While working on the SCIP announcement I wrote a detailed section explaining the difference but I feel like it's better to separate that section into a dedicated blog post.

Draft

The text below is copy-pasted from SCIP announcement and needs heavy editing before being a standalone blog post.

Background: why we use LSIF

Sourcegraph code navigation such as “Go to definition” comes in two flavors: search-based and precise. Search-based code navigation is fast, always available but can occasionally return false positives and false negative results. Precise code navigation, on the other hand, requires custom configuration to setup but the results are compiler accurate and work across repositories. Both search-based and precise code navigation are useful, they offer different

One useful analogy is to think of search-based code navigation as cooking with a rack of spices that have no labels. Most of the time, you can distinguish the difference between spices by simply looking at them. For example, black pepper is visually distinct from oregano and paprika. However, some spices such as paprika and cayenne look similar and require a closer inspection via smelling or tasting to distinguish the difference between them. Similarly, search-based code navigation provides pretty accurate results for symbols with unique names but struggles when multiple different symbols share the same name. Precise code navigation removes guesswork from search-based code navigation similarly to how adding labels on the spices removes the need for smelling or tasting to distinguish between similarly looking spices.

To drive the analogy even further, a unique attribute of precise code navigation is that it’s the equivalent of cooking in a fully equipped kitchen that has all the spices in the world. Since precise code navigation integrates with the build pipeline and compilers, it understands the full dependency tree of your codebase and can navigate to all symbols regardless if they’re defined within your codebase or come from an external source. Meanwhile, search-based code navigation doesn’t understand your full dependency tree so it more frequently misses results for externally defined symbols.

To support precise code navigation, Sourcegraph has used LSIF (Language Server Index Format) as a persistence format between language indexers, which are written in a variety of programming languages, and our Sourcegraph backend, which is written in Go and is backed by PostgreSQL. Language indexers write LSIF files to disk, which then get uploaded to Sourcegraph for further processing before the eventual data gets written into our database.

Background links

SCIP announcement https://docs.google.com/document/d/11ENvw9axbxX649xWTIMXn4Ybj_OwIfJVcpCXVA6KdDA/edit#heading=h.kmrgn1ywj5vd
Loom video explaining difference https://www.loom.com/share/fcaddfd333da487cb526a4fc99ead803?t=0

Time sensitive

Not really. I think we're planning to post a series of blog posts about code navigation (previously, code intel) sometime around june/july.

cc/ @macraig @varungandhi-src @tjdevries @scalabilitysolved

Checklist

Include links to any relevant background information, issues, RFCs, Slack threads, etc:
Indicate if your proposal is time sensitive (tied to a release, product launch, or other announcement)
(optional) Invite your teammates or other peers to give feedback on your idea
cc/ @iskyOS I can't find Andy's GH handle