Skip to content

worker: enable distributed tracing for dbworker handler jobs

Administrator requested to merge nsc/worker-tracing-inject into main

Created by: Strum355

What it do

This PR enables worker/dbworkers to emit traces to Jaeger. A span context is created whenever a job is successfully read from the database, and then it's context propagation as normal.

Sampling of traces

In its initial stage, at least, this PR does not have an answer on how to handle sampling of the vast increase of traces that would be generated. Currently, its effectively got a sampling strategy of "all", which would probably be unwise in production. Initial discussions with @efritz considered Jaeger 2.20's "delayed sampling" capabilities, but its unclear to us whether our setup/customer setups are setup for this, as well as concerns with its inter-process capabilities.

If these turn out to be non-issues or solvable issues, workers would/should create a custom non-global tracer that uses this delayed sampling instead of tracer.GlobalTracer() that the rest of the service (aka non-worker codepaths) would be using. This non-global tracer would be passed to workerutil.NewMetrics(), as ultimately consumed by either dbworker.NewWorker() or workerutil.NewWorker(). Some parts of the tracing code (probably internal/trace/ot, have not done a proper analysis of what areas) may need to be revamped to allow the tracer used to create a new child span come from any existing spans, using opentracing.Span.Tracer().

Merge request reports

Loading