Skip to content

Keyword Search

Created by: lguychard

Background

Keyword search is an experimental search mode that combines query interpretation heuristics and results ranking and filtering to provide a smarter, semi-natural language interpretation of search queries.

See the PR for the original POC here: https://github.com/sourcegraph/sourcegraph/pull/38923#issue-1307515135

Problem statement

If you’re exploring an unknown piece of code, if you don’t know the exact keywords / identifiers / structure, you’re not going to have an easy time using literal search. The original POC was built to solve the fact that it is hard to successfully find code you don't know the structure of using Sourcegraph.

At this stage, this problem statement is not validated.

Assumptions

The assumptions behind the original POC are:

  • Keyword search reduces the number of steps needed to find an unknown piece of information in a repository.
  • By extension, keyword search is helpful when onboarding onto a new codebase.

Engineering constraints

The performance implications of keyword search are significant: it runs several count:all jobs for every search, and accumulates the results in memory prior to ranking and filtering them. Stabilizing keyword search at customer scale would be a significant undertaking, which we should only tackle if we validate the opportunity.

/cc @benvenker @lguychard