Better logging in Google Cloud environments
Created by: arussellsaw
The logging tools provided by stackdriver are really powerful, and a small investment by us to emit formats that stackdriver expects would give us a bunch of benefits.
Use the Stackdriver JSON Format
The first thing i'd want to do is emit logs from all of our services in the JSON format expected by stackdriver, and attach structured logging fields from our logger to these json objects, an exmple of what that output looks like is here: https://cloud.google.com/logging/docs/samples/logging-write-log-entry-advanced
Have our logger understand our error type
Secondly along with the work @unknwon is doing in https://github.com/sourcegraph/sourcegraph/issues/16109 we think it'd be worth using an error wrapper type that attaches metadata, it's important that all of our code uses the same error type so that our tooling at the edge always knows how to interpret them. something like https://github.com/cockroachdb/errors seems to give us a lot of what we need, specifically stack traces, metadata, and preserving data across network boundaries. Our logging client will interpret these errors and attach them as structured metadata in the log output.
Attach trace IDs to logs
Thirdly one of the really cool things that google cloud logging does is associate logs and events together using a trace ID field, in cloud run you can get this from the X-Cloud-Trace header, if you make sure this is packed as a value on the context, and all of our log calls use this context and unpack that trace ID you will be able to view all logs for a given trace in the logging UI, which. really helps with debugging. I'm not sure if our ingress controller attaches the X-Cloud-Trace header, so we might want to supplant this with our own trace ID, i think this should work still. if not we should just make a best effort to attach any and all trace IDs we have access to in logs in order to give us the best chance of being able to identify events across different services.
context.Context in log calls
Most of the features mentioned above are predicated on passing context.Context to our logging calls, this is important as it allows us to attach metadata to our log calls without having to pass that data around manually. It's possible that we could do this with the existing log15 implementation, or at the very least keeping the same method signature, by inspecting the params passed and looking for a context, then pulling the data that we need if we find it. Longer term i'd prefer we move to a logger with an explicit context param as otherwise i think adoption will be low.