Metrics or indicator for search readiness (for massive scale instances)

Created by: dadlerj

Requested by https://app.hubspot.com/contacts/2762526/company/557692805

Discussion in Slack: https://sourcegraph.slack.com/archives/CJX299FGE/p1601066252032700

Context/use case

Company's instance has ~400k repositories, and their primary use case is comprehensive searches . I.e., searches to get ALL of the results across all of their code, even if it's thousands or tens of thousands of matches. They primarily use the API/CLI for this, but using the UI is important as well...

Problem

If they run the same search several times in a row, the results can fluctuate (the main culprit of this https://github.com/sourcegraph/customer/issues/82, has been fixed, but this issue still occurs periodically... And outside of the main issue, there are also questions that that fix alone didn't address, such as "can I be confident that all of my code is in Sourcegraph and indexed?").

These issues are essential to making our security use case work (as well as campaigns, insights, and more). E.g., if I search for all usages of a vulnerable library, but only later discover that some repos weren't cloned yet, or that the search timed out, I'll trust Sourcegraph less.

Desired solution

Overall, what they really need/want is an answer to “Is Sourcegraph ready for us to start searching now or not”. This wraps up several things, and it may be hard to come up with a single great metric here (since we are always cycling repos through the indexing queue). Do we have a good way to answer this question? E.g. some combo of metrics like:

How many repos haven’t been cloned at all yet?
How many repos don’t have an index at all yet?
How many repos haven’t been synced in the last [24h]?
How many repos don’t have an index in the last [24h], that do have an update in that time?
What is in the queue for indexing right now?
How many indexes aren’t loaded into memory yet?
Etc.

Additionally, I think updating the Site-admin > Repository status page also needs a bit of work (this falls on Cloud) for instances with huge numbers of repos. Summary metrics for the radio buttons like “Needs index” are probably more important than the full list.

Tagging both Search (for the bulk of this, the actual search stuff) and Distribution (for the metrics/observability aspect).