Keyword Search
Created by: lguychard
Background
Keyword search is an experimental search mode that combines query interpretation heuristics and results ranking and filtering to provide a smarter, semi-natural language interpretation of search queries.
See the PR for the original POC here: https://github.com/sourcegraph/sourcegraph/pull/38923#issue-1307515135
Problem statement
If you’re exploring an unknown piece of code, if you don’t know the exact keywords / identifiers / structure, you’re not going to have an easy time using literal search. The original POC was built to solve the fact that it is hard to successfully find code you don't know the structure of using Sourcegraph.
At this stage, this problem statement is not validated.
Assumptions
The assumptions behind the original POC are:
- Keyword search reduces the number of steps needed to find an unknown piece of information in a repository.
- By extension, keyword search is helpful when onboarding onto a new codebase.
Engineering constraints
The performance implications of keyword search are significant: it runs several count:all
jobs for every search, and accumulates the results in memory prior to ranking and filtering them. Stabilizing keyword search at customer scale would be a significant undertaking, which we should only tackle if we validate the opportunity.
/cc @benvenker @lguychard