search: Implement `repo:description(...)` predicate (!38337) · Merge requests · Administrator / sourcegraph

Administrator requested to merge tl/search-by-repo-description into main Jul 06, 2022

Created by: tbliu98

Implements https://github.com/sourcegraph/sourcegraph/issues/30733

This PR introduces the repo:description(...) predicate. This predicate accepts a valid regular expression as an argument and will filter the list of repos returned from the database to only those that match the given regular expression.

The regex passed to the predicate is transformed into a "fuzzy" regex during job creation, e.g. in repo:description(go package), go package is transformed to (?:go).*?(?:package) before being added to the database query. I anticipate that in practice, a user searching for repo:description(go package) probably wants their result set to be broader than only repos that contain the exact string go package in their description. For example, the repo github.com/hashicorp/go-multierror has the description A Go (golang) package for representing a list of errors as a single error.. The regex go package will not match this description, but I imagine a user would expect repo:description(go package) to match that repo.

The added filter on the database query uses either the case-insensitive regex operator ~* or the LIKE operator to filter on the repo.description column, both of which can make use of the trigram index on that column. The index also supports a similarity operator % which only returns rows with a similarity score higher than a given threshold (I believe the default is 0.3), but when running test queries against the cloud database I found that the performance of % was far slower than ~* or LIKE, and (subjectively) the results were not as close to what I think real-life users would expect. I'm happy to share the outputs of those test queries if people are curious.

I mainly used https://github.com/sourcegraph/sourcegraph/pull/31577 and https://github.com/sourcegraph/sourcegraph/pull/35374 as references for implementing the front end client changes. Would appreciate a close look at those files!

Test plan

Unit tests, integration tests, manual end-to-end tests.

Example screenshot:

search: Implement `repo:description(...)` predicate

Test plan

Merge request reports