Track and/or query usage in pings
Created by: rvantonder
We should track latencies for queries containing and/or operators. There are a couple of open question around the granularity we want to track at. For example:
-
it's not as useful to aggregate the times for a simple query like
foo and barversus something more complex likefoo and bar or baz and qux or ..... It'd be nice to know roughly whether a query is 'complex' or not. -
andoperators are currently much more computationally expensive, so if we don't trackandversusorexpressions, our results, and source of slowness, will be muddy. -
Currently operators only apply to search patterns. In time they will apply to files or repos too. Tracking latency here is an additional complexity (or possibly, not as important).
Proposal is to track all of the following:
- Aggregate latency of all queries containing any operators for search patterns
- Latency for queries containing only
andexpressions, or alternatively, queries containing exactly oneandexpression (which is a common use case) - As above, but for
orexpressions
It'd be great if, @asdine, you can come up with alternative or improved ways that we might track performance of these expressions an at what granularity.
Part of the challenge is that we have 3 search kinds (literal, regexp, structural). I propose we add whatever granularity we settle on for regexp only, initially.
Side note: one thing I'd love to know is the structure of and/or queries which we could efficiently encode with huffman encodings and would solve the complexity of understanding the shape of queries. We'd have to check whether this is an acceptable piece of information to collect, and whether it would truly be useful to us though.
Background:
Take some time to look at the following 3 PRs.
(1) Previous latency logging was primarily added in https://github.com/sourcegraph/sourcegraph/commit/58c87971d955aa3d9f4955004cf1a715625f2256 in the logSearchLatency function. To log and/or query details, we should extend this function to check whether a query is an and/or query, and inspect it for operators.
(2) The above log call puts data in the DB, which is periodically fetched and sent to us. That fetching happens in the changes introduced by this PR, where the top leve function is getAndMarshalSearchUsageJSON: https://github.com/sourcegraph/sourcegraph/pull/8432/files
(3) The above fetching refers to types of the search stats that we query. We'll need to introduce new types for and/or query tracking, in this file for SearchUsageStatistics: https://github.com/sourcegraph/sourcegraph/blob/master/cmd/frontend/types/types.go.
The scary comments in the file for the last PR about symmetric change refers to the fact that we need to populate the data as returned by the top level function in PR (2), IIRC.
Finally, we'll need to add the new statistic definition for and/or queries to the BigQuery schema like in https://github.com/sourcegraph/analytics/pull/25, which is the part that defines how we store data sent to us from some Sourcegraph instance.
Related notes:
There isn't really a way to test the whole pipeline. An initial approach would be to start adding/modifying code where the log call happens in (1), and adding logic to trigger logging when detecting and/or queries. Then flesh it out with appropriate search usage types as in PRs (2) and (3). We can test the part where we actually log the on the correct properties, given an and/or query input.
There's also some trouble with the existing logging that reports impossibly large numbers sometimes, and I suspect an overflow here and filed #10058 (closed) to investigate but haven't done so yet. So @asdine if you spot something awry in the logging code don't be too surprised :-)
For context, this work involves adding logging to the backend, which doesn't include the time the webapp takes to send/receive results. We actually (eventually) want to track the complete end-to-end latency including the webapp . This means we'd move the Go log calls to the webapp side at some point, but not in this PR. Further, the webapp doesn't completely understand the shape of and/or queries either. When we are in a position to record end-to-end times including the web app, we'll be able to reuse all the existing type definitions and usage stats event logic added here.