worker: enable distributed tracing for dbworker handler jobs
Created by: Strum355
What it do
This PR enables worker
/dbworker
s to emit traces to Jaeger. A span context is created whenever a job is successfully read from the database, and then it's context propagation as normal.
Sampling of traces
In its initial stage, at least, this PR does not have an answer on how to handle sampling of the vast increase of traces that would be generated. Currently, its effectively got a sampling strategy of "all"
, which would probably be unwise in production. Initial discussions with @efritz considered Jaeger 2.20's "delayed sampling" capabilities, but its unclear to us whether our setup/customer setups are setup for this, as well as concerns with its inter-process capabilities.
If these turn out to be non-issues or solvable issues, workers would/should create a custom non-global tracer that uses this delayed sampling instead of tracer.GlobalTracer() that the rest of the service (aka non-worker codepaths) would be using. This non-global tracer would be passed to workerutil.NewMetrics(), as ultimately consumed by either dbworker.NewWorker() or workerutil.NewWorker(). Some parts of the tracing code (probably internal/trace/ot, have not done a proper analysis of what areas) may need to be revamped to allow the tracer used to create a new child span come from any existing spans, using opentracing.Span.Tracer().