search: ignore dial errors during zoekt rollout
Created by: keegancsmith
This is an extension of our previous pattern where we ignore errors caused by a zoekt rollout. Starting 2022-01-18 we started encountering these errors during zoekt rolloouts. Our suspicion is a change in kubernetes/gce networking or service discovery.
We extracted these errors from honeycomb and correlated them with rollouts. In particular the i/o timeout error was occurring enough to trigger our alert thresholds.
Observability was extended to record a reason in prometheus and traces since we can now have multiple reasons. Additionally a minor observability bug was fixed where we counted non-dns.IsNotFound errors.
Test Plan: Just unit tests. I'm confident in the code due to lots of exploring of our instrumentation and the reading of the stdlibs net package.
Fixes https://github.com/sourcegraph/sourcegraph/issues/30795