POC: keyword search
Created by: novoselrok
Proof-of-concept keyword search implementation via a new "smart" job type. The PR is intended to gather feedback and evaluate keyword search as a potential future avenue of research.
How it works
Query transformation:
- I added a new smart pattern type (yet another pattern type, but it makes it easier to distinguish from other search implementations without breaking stuff)
- It accepts simple queries in the form of "pattern pattern2 pattern3" or "repo:r -f:test pattern pattern2"
- It extracts the patterns, stems them, removes stop words, and replaces patterns with a
lang:
if applicable - The patterns then get converted to an OR query "pattern OR pattern2 OR pattern3" (in an effort to get a big result pool) + any existing filters
- We create a new smart job with the transformed query
Result filtering
- We group the resulting line matches according to their line numbers
- We filter all valid groups (a group is valid if it contains a sufficient ratio of original patterns (the actual ratio can be tweaked))
- A group is also assigned a score (combined with the file score, amount of keywords, distinct matches, and distinct matches per line)
- Finally, groups are sorted by the score, flattened, and sent to the client one-by-one
Demo
https://user-images.githubusercontent.com/6417322/179460124-ab958760-55cf-40ed-bad1-4a19418d3535.mp4
Test plan
- Noop