Skip to content

POC: keyword search

Administrator requested to merge rn/keyword-search-poc into main

Created by: novoselrok

Proof-of-concept keyword search implementation via a new "smart" job type. The PR is intended to gather feedback and evaluate keyword search as a potential future avenue of research.

How it works

Query transformation:

  • I added a new smart pattern type (yet another pattern type, but it makes it easier to distinguish from other search implementations without breaking stuff)
  • It accepts simple queries in the form of "pattern pattern2 pattern3" or "repo:r -f:test pattern pattern2"
  • It extracts the patterns, stems them, removes stop words, and replaces patterns with a lang: if applicable
  • The patterns then get converted to an OR query "pattern OR pattern2 OR pattern3" (in an effort to get a big result pool) + any existing filters
  • We create a new smart job with the transformed query

Result filtering

  • We group the resulting line matches according to their line numbers
  • We filter all valid groups (a group is valid if it contains a sufficient ratio of original patterns (the actual ratio can be tweaked))
  • A group is also assigned a score (combined with the file score, amount of keywords, distinct matches, and distinct matches per line)
  • Finally, groups are sorted by the score, flattened, and sent to the client one-by-one

Demo

https://user-images.githubusercontent.com/6417322/179460124-ab958760-55cf-40ed-bad1-4a19418d3535.mp4

Test plan

  • Noop

Merge request reports

Loading