API docs: search integration
Created by: slimsag
Unclear exactly what this would be. Some thoughts:
- Repo-only search: fuzzy matching similar to what e.g. Olaf proposed.
- Maybe part of codeintel DB? elsewhere?
- using bloom or xor filters for fuzzy matching?
- Global search
- Maybe fuzzy, like what Olaf proposed
- Maybe not supporting our regular search syntax
- Maybe on a separate page / with a separate input bar
- Maybe implemented with postgres FTS? in which DB?
Thorsten:
If I could hit a keyboard shortcut on the repo page (d for docs vs t for tuzzy finder?) to fuzzy find through the docs that would be :chef_kiss:
Olaf:
This is a cool idea. I did a small proof-of-concept to extend the fuzzy finder to search symbols. I hit on a problem that the quality of the symbol data is quite bad. It’s totally worth trying again with the API docs data
Me:
Hey team, just wanted to get a general pulse for how hard it would be to add API docs as a new search result type these days? I heard there was work recently ongoing to integrate Solr into our stack but am otherwise completely unfamiliar with it. API docs is still early stages, and search integration isn't a goal (this would a side hack) so another idea I am floating is just using Postgres FTS with a separate DB container as that'd be pretty quick to hack together and wouldn't impact anything. Any thoughts?
Juliana:
On the front-end side, it’s fairly straight-forward: you can add a new match type to the stream API and then add the logic to render the search result to
StreamingSearchResultsList
.
Me:
interesting, thank you for that!
Juliana:
Note that I’m planning to simplify this code in the next few days/weeks to remove the GraphQL type conversions as we have now removed the GraphQL search results UI entirely.
Camden:
Backend side is much more straightforward than it used to be now that most of the search code lives outside graphqlbackend. It mostly involves adding a new result type that implements the Match interface (https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/internal/search/result/match.go?subtree=true#L8:4). Once you have something that returns these match types, it can be plugged into the aggregator (see the Do* methods https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/internal/search/run/aggregator.go?subtree=true#L65:10) There are still a few spots that we unwrap the Match interface into its concrete type (https://sourcegraph.com/search?q=context:global+repo:%5Egithub%5C.com/sourcegraph/sourcegraph%24+switch+:%5B1%5D.%28type%29+%7B+...+case+*result.FileMatch:+...+%7D&patternType=structural), so watch out for those, but for the most part, we work with that interface directly
Me:
@camden Cheek thanks for the links! I'm guessing that ignores the indexing/storage aspects though?
Camden:
Ah, yes it does. Those steps cover you from the point of “I have a way to search for these things and get results”
Me:
makes sense!
Camden:
RE: storage/indexing, it might be possible to spin up another zoekt container for FTS, but that’s outside something I can really comment on. @search-core-support may be able to help you out if you want to explore that direction (edited)
Keegan:
We don't have something as nice and generic as ES or Solr for indexing documents. We don't have any work in our roadmap to add those sort of interfaces, since they are not needed for our 5 million repos goal. I could imagine if api docs are a subset of the code corpus, we could enrich zoekt data with it like we do for symbols. eg you search for Router and we find docstrings/symbol defs with Router in it and hydrate in that it could be an API doc result. Then something else hydrates in that information properly. Get benefits of ranking/etc as we work on it. Alternatively I'd consider just using postgres or some other more minimal IR system and hooking it up. There is a lot of work then around understanding updating/etc, which you get for free in our infra if it is tied to a repo@rev.
https://sourcegraph.slack.com/archives/CHEKCRWKV/p1625605412478100