Skip to content

Improve client error tracking of the Sourcegraph application: Tracking issue

Created by: valerybugakov

Problem to solve

Today, we lack processes around client-side error monitoring, error reporting, and prioritizing and fixing runtime errors. Consequently, it can be difficult for Sourcegraph engineers to see, understand, and fix problems in the Sourcegraph application, which has a negative impact on engineering teams' effectiveness and, ultimately, on our customers. For the observability of on-prem instances, distributed traces, and other more advanced observability efforts, we'll follow the lead of the DevX team.

Measure of success

  • We have a robust process for runtime error monitoring on the client, and it's documented in the handbook.
  • Relevant services and tools are configured to support the error monitoring process documented in the handbook.
  • An automated notification process lets the right people know there's an issue, and we have documentation that tells them how to address it.

Solution summary

  • Evaluate the relative advantages and disadvantages of Sentry vs Datadog and determine the best path forward
  • Improve Sentry configuration to make production error debugging easier.
  • Collaborate with the DevX team on proxying client events through our backend.
  • Introduce guidelines on error handling in client applications.

What we're not doing right now:

  • Explore ways to allow on-prem clients to share logs with the Sourcegraph team to speed up debugging.
  • Extend Open Telemetry traces into the frontend, so that client teams can easily collect a sequence of events to debug client issues
  • Explore what src debug can do today in regard to traces, and see if it can be used to export client log events as well

Artifacts:

What specific customers are we iterating on the problem and solution with?

Internal Sourcegraph developers

Impact on use cases

Delivery plan

Tracked issues

@unassigned

@plibither8: 3.00d

Completed

@valerybugakov

Completed