frontend: add health check
Created by: ggilmore
This PR introduces the package checks
. The package provides a comon framework for defining and consuming health checks across services.
Overview
- Each service defines its own checks. Checks have the format
func(ctx context.Context) (any, error)
- Checks are run in the background. The frequency is configurable.
- Each service exposes a /checks endpoint on its default port. The endpoint returns a JSON with a fixed format. This is an example of how a reponse from
GET <frontend>/.internal/checks
looks like:
# /checks
{
"check_can_reach_gitserver": {
"error": "",
"last_run": "2022-06-30T13:01:45Z",
"out": "[{\"addr\":\"127.0.0.1:3178\",\"out\":\"\"}]",
"status": "OK"
},
"check_dummy": {
"error": "",
"last_run": "2022-06-30T13:02:05Z",
"out": "{\"Time\":\"2022-06-30T13:02:05.815109Z\",\"Msg\":\"hello\"}",
"status": "OK"
}
}
- Frontend plays a special role. It exposes a second endpoint that aggregates checks from all services, including frontend itself. The checks are grouped by service > address > check:
# /aggregate-checks
{
"frontend": {
"localhost:3090": {
"check_can_reach_gitserver": {
"error": "",
"last_run": "0001-01-01T00:00:00Z",
"out": "",
"status": "PENDING"
},
"check_dummy": {
"error": "",
"last_run": "2022-06-30T13:01:25Z",
"out": "{\"Time\":\"2022-06-30T13:01:25.811925Z\",\"Msg\":\"hello\"}",
"status": "OK"
}
}
}
}
- frontend serves the aggregate endpoint under /debug/aggregate-checks. This way it is reachable under port :6060 within the container
- We avoid the debug port :6060 for other services because we cannot guarantee that it is reachable from frontend in every environment.
- /checks enpoints
- frontend: /.internal/checks
- gitserver: /checks
- zoekt-webserver: /checks
- searcher: /checks
- TBD
How can an admin access the health checks?
In the first version, admins can access the frontend container and curl the aggregate endpoint on the debug port :6060. In the future we might want to expose the data via the GraphQL API and make it consumable from the site admin interface.
Test Plan
sg start
curl http://localhost:3090/.internal/checks -sS | jq
curl http://localhost:6063/aggregate-checks -sS | jq