search: fork/archived UI notifications
Created by: rvantonder
This adds a notification as seen below. This shows up if a fork (resp. archive) is excluded only if the fork: (resp. archive:) option isn't explicitly specified in the query.
I plan to still add an e2e web regression test and some backend test, though I would much prefer an integration test for it.
Notes:
-
On hover, it says to
add fork:yesfollowed by the list of repos. There isn't a lot of freedom to format this, and it will overflow on the width of the hover box if the message is longer. I think the current message is OK. -
A repo may be a fork and an archive, or only a fork/only an archive. It's best if we treat the labels separately as far as showing the notification
Backend implementation stuff
-
At first I thought to do a query of all repos including forks/archives, and then filter out the forks/archives. But, I discovered that the standard DB call will not populate those fields/labels for repos due to perf issues. Instead of iterating over the entire list and then 'hydrating' whether they are forks or repos, I instead take the original query and modify it to make two additional DB calls where I set
fork:onlyand thenarchived:only, and put those matching repos in the 'excluded' list. My intuition is that this is faster than callingGet(...)for each repo and then checking fork/archived. The additional DB calls run in the typical case whenfork/archivedunspecified, but not otherwise. -
I had to refactor tests that relied on a mocked DB
Listcall. These tests want certain invariants to hold for the DB options when matching repos. The current/previous 'easy' way to do this is just to accept all the DB options as default and set the ones that matter. When we changed the default of fork/archive matching in a previous PRs (e.g., https://github.com/sourcegraph/sourcegraph/pull/8739), I had to change these tests to now use the newNoFork... default option. Now, these values can change since there are multiple calls to DBList. The answer is that these tests should not be relying on such strong invariants on the fork/archive inclusion/exclusion logic because that's not what they're testing. I have updated the tests to check for weaker invariants instead. -
This will conflict with refactor #10489. I'd appreciate if we could hold off on merging #10489 until after branch cut, since I'd like to get this feature in for 3.16 without issues that might crop up due to refactor and/or adjusting to it.
Merge request reports
Activity
Created by: rvantonder
This works on manual testing as expected, but I'm getting
TypeError: Cannot read property 'filter' of undefinedfor theprops.results.excluded.filter(repo => repo.isFork)call in the tests. Not sure what I'm doing wrong (why does.map(...)work elsewhere?), but it's probably something silly.Created by: rvantonder
At a high level I would approached this by adding a new dynamic filter. ... could be done in a way which is independent of "resolveRepositories". Did you consider this approach?
This PR is based off of the suggestion in https://gitlab.sgdev.org/root/sourcegraph/-/issues/9928#issuecomment-624997214
I don't think the dynamic filter solution addresses all the concerns raised in https://gitlab.sgdev.org/root/sourcegraph/-/issues/9928, namely:
- we should be clear about e.g. how many forks were excluded
- the filter would not suffice in the context of https://gitlab.sgdev.org/root/sourcegraph/-/issues/10313#issuecomment-625451535, since this is only a "semblance of 'there are possibly more results omitted here'" and currently used as a stop gap.
Basically, I dismissed this approach because I'm convinced based on the discussion it doesn't address the issue. I think it's useful to know that forks/archives were omitted instead of preemptively adding dynamic filters, and as such I don't see any way but to rely on some sort of
resolveRepositories-like call to know that (unless I'm missing something about what is implied by the approach you had in mind).this list could be quite large? Do we really need the whole list to add the label? Surely we can do it with a db.list with a
select name, fork, count(1) from repos where $SEARCH_CONDS and fork group by fork. This suggestion is me thinking from a perf perspective.In the context of the parent issue, the concern is to surface more complete knowledge of what was omitted. This could be "at least N fork repos were excluded" for N = 1, but I don't know that including partial information of excluded repos is a good perf <-> UX tradeoff. @slimsag thoughts?
Edit: maybe we could clamp N to be up to 20 or something.
Created by: keegancsmith
Well shit I just wrote a long response and it got lost somehow. So just gonna write a very brief response:
The approach I suggested is exact and fast (it includes a count). Dynamic filters can do DB queries/etc. The UI you have is much better though, but maybe the ideas can be extended.
In the past our perf work for large amounts of repos was all about the DB. Listing repos is slow, that is why we have the funky hydration logic. Your query asks for forked/archived which can make it very slow. Given it is the day of branch cut can you add a feature flag (default on) for populating this list. I am concerned about performance.
Can you mark the field you add to graphql as experimental, so we have freedom to change it next release if we decide to.
GraphQL allows querying the size of a list and just fetching N items. This may be useful, but I'm unsure if our graphql server library makes it possible to do efficiently.
Created by: rvantonder
got lost somehow.
:'(
The approach I suggested is exact and fast (it includes a count). Dynamic filters can do DB queries/etc
I'm not familiar with the part of the code that will give me this count without querying here. Whenever we have a chance, feel free to point that out. Updating the UI code shouldn't be a big deal if I can extract this value on the frontend.
I am concerned about performance.
I mean, yes, me too. But I don't see another way to do this. Also, previously, before excluding forks by default, the DB List would run including forks/archives, so we would conceptually end up with the same size of the list and similar DB queries that check
Fork/Archivefields in the DB. I realize multiple queries are not the same, and I don't know if that intuition checks out. Other than that and the slow DB lookup concern, we don't need to send the result of DB List over GQL in theexcludemember, and just clamp it.Can you mark the field you add to graphql as experimental, so we have freedom to change it next release if we decide to.
GraphQL allows querying the size of a list and just fetching N items. This may be useful, but I'm unsure if our graphql server library makes it possible to do efficiently.
I'll see what I can do tomorrow before branch cut, but I'm not going to go to nuts here, I've already overcommitted to get to this point. I'd rather just not merge rushing for branch cut and continue in the next iteration.
Created by: slimsag
Yeah Keegan makes some good points here, I apologize for not thinking of that ahead of time.
No need to land this in 3.16.0, we can do a patch release once this is solid and ready. It's worth taking the time to do it right (but worth keeping as a high priority, too, of course).
Created by: rvantonder
Based on the current discussion, I'm fuzzy on how to proceed. It's sounds like there are other ways to achieve what we want. Directions:
-
Change the schema to
excludedForks: int,excludedArchived: int, no repo names. Add a backend DB call that only gets the count for these respectively. Where/how exactly we do this I'm not exactly sure yet, so I suppose the proposal is to add a helper function that does this? I'm inclined to add this in the vicinity ofresolveRepos(see below). This direction sounds like we don't need to expose a parameter N then. -
Use current state of PR, but add ability to fetch only N items from
excludedas a GQL parameter. Mark theexcludedfield as experimental. Accept a parameter N in the GQLexcludedendpoint for the size of the list. -
Alternatively, via dynamic filter logic which queries DB based on suggestion:
exact and fast (it includes a count). Dynamic filters can do DB queries/etc.
I looked briefly at the dynamic filter code, it doesn't seem to be doing any DB-related calls currently so that would need to be added? I think the
DynamicFilterendpoint is too closely attached to the UI chip representation and seems orthogonal to addressing the perf issue. I think the UI in the status bar of the original PR is better suited than the dynamic filter/chip UI. Thus, whatever count/DB lookup we would have done to add for a dynamic filter can be done in the part that looks up forks/archives around the area of resolveRepos. I sympathize with not wanting to bloat resolveRepos due to the complexity, but doing the lookup in that vicinity feels like a better, more UI-independent place for this to live. I can try separate this logic more to help with that, but will likely take more time.Maybe we can align on option (1) here @slimsag and @keegancsmith?
-
Created by: slimsag
I think (3) will be harder to implement and is further away from your current PR, so I would advise not going with that option for now at least.
I think (2) would be OK, and as Keegan noted in his inline comment it is possible for us to modify the
resolveRepositoriesdb query to only select N "excluded" repositories from the DB, but I think this will be tricky/difficult to implement unless you hard-code the number to e.g. 5 and then the UI would either need to display "5+ forked repositories excluded" or separately query the total number. My only argument against this solution is that I think it will involve making more changes and be more involved than it sounds at first glance.I think (1) is a good solution and should be easiest to implement, you would just need to modify
resolveRepositoriesto calldb.Repos.Countinstead ofdb.Repos.Listwhich should make it much more efficient to calculate.In any case, I think me and @keegancsmith both ultimately trust your judgement on the best way to proceed here :)
Created by: keegancsmith
My personal preference is to avoid expanding our graphQL API, especially if it involves harding things like "archived" and "forked". But to be honest it is the most practical way to achieve what we want, so lets do that.
I think the problem with dynamic filters is the UI isn't as nice, so it makes sense to make it first class. So go for (1)
Created by: rvantonder
OK finally converged on this. Solution:
- Only do accounting for the number of repos excluded
- Use
Repos.CountincomputeExcludedRepositories(all affected tests mock this function now) - Expose the counts through
DynamicFilteras proposed. In the webapp, pull these values out of theDynamicFilterto show the same UI notification (and also the dynamic filter chips, order is nondeterministic) - No changes to GQL API (it's fairly straightforward to get the info out of dynamic filter values)
This improves on solution (1) above because there are no changes to the GQL API now by using values exposed via DynamicFilter. PR Needs to be reviewed again because of the above substantial changes. Webapp (cc @lguychard) is only minor changes. PTAL.


