Tracking issue - Health Status Tooling
Created by: caugustus-sourcegraph
Plan
We want to offer admins a simple way to evaluate the health of a Sourcegraph deployment.
Requirements
- Define a list of technical health checks that validate deployment is healthy
- Ship a prototype health status tool that provides validation that everything is set up properly. The prototype should assess the following:
- Can services talk to each other as expected
- Is the disk configured as expected (i.e., each replica has a separate location defined, etc)
- Is the disk and network performing as expected
- Can the instance communicate with the code hosts
- The tool should work for Kubernetes deployments (w/ or w/o Helm)
- The tool should render health status in a visually simple way via CLI (think stop sign colored updates). For an example please see this Reddit post.
- Ship documentation on remediation efforts or troubleshooting best practices. The documentation should be descriptive enough to assist customers/CEs debug at each level of the health check. The documentation should address both Yellow and Red output statuses.
- Share this tool with CE to validate with select customers