repo-updater: add dumb mitigation for duplicate repo issue + improved insight for further debugging
Created by: slimsag
TL;DR: This PR does two things:
- Adds dumb (but completely valid / working) mitigation for the duplicate repo issue a customer is facing, so that they can upgrade to 3.3 while we identify the underlying cause.
- Adds better tracing & logging so that we can actually track down that issue.
Detailed explanation:
#3680 prevents a customer from upgrading successfully to 3.3 due to the fact that (I believe) when this duplicate repository state occurs it blocks all other repositories from updating. My goal with the first change is to unblock them from updating by just ignoring such duplicates. In practice, this means everything will work fine (likely even for the duplicate repo).
This change also adds detailed but non-verbose logging and tracing so that we can identify the source of the duplicates. This will definitively tell us whether or not it is indeed coming from the NewDiff
or ListRepos
codepaths, without us needing to ask for a DB state dump or deploying another debug image. If it turns out to be coming from the ListRepos
codepath, I would add further insights at the DB layer later to determine if it is a corrupt postgres index which allowed us to violate the DB constraint somehow.
Once we've resolved the underlying issue #3680 we should definitely revert the first change, and then we can choose to revert the 2nd tracing/logging change depending on whether or not you feel it'd still be useful in the future @tsenart @keegancsmith.
After merging, I will ship this in a new patch release.
Helps #3680