Skip to content

batches: prevent fighting between polling and show more button in workspaces connection

Administrator requested to merge kr/show-more-pls into main

Created by: courier-new

Fixes the problem observed in the UI when loading additional workspaces from the execution list with the "show more" button where sometimes no additional workspaces would appear. This is due to the response with the extra page of paginated data "fighting" with the response from the regular polling request for the already-loaded pages.

So, shouldn't Apollo Client be smart enough to handle that? Turns out there's a couple things going on here that make the situation non-ideal:

  1. The fact that we're polling paginated data in the first place
    • Apollo has historically outright stated this is unsupported and pollInterval is not to be used in conjunction with fetchMore [1] [2], and what we've implemented is a big fat workaround.
  2. We didn't have an id to identify a BatchSpecWorkspaceResolution by
    • When we would query and write to this field from other screens from the execution UI, then, Apollo would default to completely overwriting any previous workspace we got, causing confusion and breaking certain cache writes.
  3. Our workspace connection data can only be queried on a BatchSpec node
    • One of many deep architectural choices of our GraphQL setup, one can only query such things as workspaces by first querying their parent batch spec node type and then providing a nested query to its workspaces. This makes custom field policies, such as those which are typically recommended for properly merging paginated data in Apollo Client impossible really hard to get right. 😅 I spent a while trying, but a) it's really hard to get insight as to what Apollo Client is actually doing to validate if your changes are having any impact, and b) there are not many places with documentation or examples for this configuration outside of very basic examples.

So, how do we get around this?

  1. Don't actually use pollInterval and fetchMore at the same time.
    • We now query the workspaces' place in queue, state, and diff stat separately from the "normal connection" query; the former, dynamic data will be polled regularly with a manual polling interval, and the latter, mostly non-changing data will only be requested once per page load.
    • As Apollo is still smart enough to merge these results in the cache, we can then pull the combined data from both out for the workspaces list UI.
  2. Expose the workspace resolution job id as a field on the BatchSpecWorkspaceResolution
  3. 🤷‍♀️ It's not in scope of this singular issue to solve this problem right now. But when we're the only product team actively trying to poll paginated data, I doubt it's really a priority for anyone else, either.

Demo video:

https://user-images.githubusercontent.com/8942601/186096225-5eaa7e39-b175-43d0-aa26-61f1894dfb51.mov

Problems

The one (big) annoyance with this technique is that the Apollo Client cache doesn't seem to work the same way in the Storybook stories environment, so our workspace lists are in stories are... empty. 🙃 I couldn't seem to find any documentation about somehow configuring the mock provider with initial test data pre-loaded in the cache, so the only way I could see that we get around this would be if we manually wrote the mock data to the appropriate cache keys within each storybook story, which would be soooo tedious and brittle. 😭

Of course, we could have ditched Apollo Client and reverted to the old requestGraphQL request style or something else for this query, but this didn't feel like the right call because:

  • It's promoting a fractured system and really just delays the work for us to make this work in Apollo later on when we completely migrate to Apollo.
  • Our polling practices are bad for other reasons (for example, the fact that we poll tons of fields that aren't ever going to change, and the fact that we poll forever even if we can be confident that the thing we were polling for stopped changing) that would be a lot harder to solve without a cache.

Follow Up

It's probably worth porting this solution to the changesets list on the batch change detail page, as well. This problem also impacts that list, but it happens a lot less frequently, it seems, so that's a lower priority. I'll write up an issue for this.

I'm also going to be writing up a more general issue about improving our polling strategies to more effectively take advantage of the cache.

Test plan

Extensive manual testing.

Merge request reports

Loading