Skip to content
Snippets Groups Projects

Search backend: introduce `query.Flat`

Created by: camdencheek

Stacked on https://github.com/sourcegraph/sourcegraph/pull/34771

Currently, we have two query types in the backend:

  • query.Q represents the full tree structure of a query
  • query.Basic restricts tree structure to only the pattern nodes

We split a query.Q into a set of query.Basic so that we can query backends that don't support full tree-shaped queries.

However, we also have an implicit third type, which I'm calling query.Flat. A query.Flat has a flat list of parameters and zero or one pattern nodes. Stated differently, query.Flat is the subset of query.Basic that has no and or or expressions in the pattern node.

There is a certain point in our job planning where the meaning of query.Basic shifts from "the only and/or expressions are in the pattern node" to "there are no and/or expressions anywhere in the query." In particular, ToTextPatternInfo requires that the basic query passed to it is a query.Flat. If this is not the case, it will just silently fail to generate a valid PatternInfo. However, during the optimize step of our job generation, we violate this expectation by passing in a tree-shaped query.Basic in ToSearchJob. This turns out okay because we end up throwing away any jobs that depend on the broken parts of TextPatternInfo, but it leads to a very confusing flow of information.

This PR introduces query.Flat to enforce the flat query shape with our type system. It does this in four steps (represented by the four commits in this PR).

  • Reorganize methods on query.Basic so that all the methods that depend on just the parameters are grouped together
  • Create a new Parameters type that is embedded in query.Basic
  • Move relevant methods from query.Basic to query.Parameters
  • Mint query.Flat, which embeds query.Parameters, allowing it to expose all those methods without duplication.

This allows us to continue using query.Basic as we were before (all the methods moved to Parameters are available through query.Basic because Parameters is embedded).

Proposed next steps: The first thing I'd like to do with this is to make the contract of `toTextParameters` and `ToSearchJob` more explicit. This is not currently possible because, even though parts of those methods depend on the query being a `query.Flat`, we use them with tree-structured `query.Basic` to generate optimized queries as well, and just throw away the non-optimized parts.

So, in order to actually make that possible, I think we also need to shift how we're doing query optimization a bit. I think it would probably be more clear if we generate jobs in a two-step process:

  1. For each basic query in Plan, generate jobs for backends that can use a query.Basic directly. This will be commit search and zoekt jobs.
  2. Decompose query.Basic into a set of query.Flat wrapped in AndJob and OrJob, then generate jobs for backends that require a flat query.

I think this is considerably easier to follow than the current paradigm of:

  1. generate all jobs from query.Flat composed with AndJob and OrJob
  2. generate optimized jobs and replace the unoptimized jobs with NoopJob.

Test plan

Should be semantics-preserving. Depending on tests to catch any typos.

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Created by: rvantonder

    Review: Approved

    Nit: this PR doesn't need/use query.Flat right?, the changes make sense to just change the methods to work on Parameters since that's the real change (nice btw!).

    About query.Flat, makes sense for next steps and

    if we generate jobs in a two-step process: ... I think this is considerably easier to follow than the current paradigm of:

    My take is that, this is actually going to be quite a bit of effort to do correctly. I encourage it, and agree it's going to be easier/more desirable than the current paradigm. But the current paradigm exists for a reason, because the former is... IMO, going to be lots of effort/difficult. I would say go for it, and be OK with the fact that it may not be a straight path to get there (lots of shuffling/concerns). The main thing I see as troublesome is effectively identifying the type of job/backend based on query structure, which is encoded in ToSearchJob, and needs to get lifted.

    Please also consider structuring the code in such a way, that it would be easy to migrate the generic structure (expressions over query.Flat) to something specific (like query.Basic + backend QueryToFooQuery). For example, right now structural search relies on the generic structure, but will ideally rely on its own QueryToStructuralSearchQuery. I'd like the code path to allow doing that easily. Right now it's more "convoluted", in terms of tree replacements, but it's also "generically" supported, so I'd opt that the two-step process have generic support for casing out on optimized/tree type. I think that's what you'd shoot for anyway, but just want to give a concrete need/example to make it explicit.

Please register or sign in to reply
Loading