Skip to content

frontend: add health check

Warren Gifford requested to merge check-frontend-gitserver into main

Created by: ggilmore

This PR introduces the package checks. The package provides a comon framework for defining and consuming health checks across services.

Overview

  • Each service defines its own checks. Checks have the format func(ctx context.Context) (any, error)
  • Checks are run in the background. The frequency is configurable.
  • Each service exposes a /checks endpoint on its default port. The endpoint returns a JSON with a fixed format. This is an example of how a reponse from GET <frontend>/.internal/checks looks like:
# /checks
{
  "check_can_reach_gitserver": {
    "error": "",
    "last_run": "2022-06-30T13:01:45Z",
    "out": "[{\"addr\":\"127.0.0.1:3178\",\"out\":\"\"}]",
    "status": "OK"
  },
  "check_dummy": {
    "error": "",
    "last_run": "2022-06-30T13:02:05Z",
    "out": "{\"Time\":\"2022-06-30T13:02:05.815109Z\",\"Msg\":\"hello\"}",
    "status": "OK"
  }
}
  • Frontend plays a special role. It exposes a second endpoint that aggregates checks from all services, including frontend itself. The checks are grouped by service > address > check:
# /aggregate-checks
{
  "frontend": {
    "localhost:3090": {
      "check_can_reach_gitserver": {
        "error": "",
        "last_run": "0001-01-01T00:00:00Z",
        "out": "",
        "status": "PENDING"
      },
      "check_dummy": {
        "error": "",
        "last_run": "2022-06-30T13:01:25Z",
        "out": "{\"Time\":\"2022-06-30T13:01:25.811925Z\",\"Msg\":\"hello\"}",
        "status": "OK"
      }
    }
  }
}
  • frontend serves the aggregate endpoint under /debug/aggregate-checks. This way it is reachable under port :6060 within the container
  • We avoid the debug port :6060 for other services because we cannot guarantee that it is reachable from frontend in every environment.
  • /checks enpoints
    • frontend: /.internal/checks
    • gitserver: /checks
    • zoekt-webserver: /checks
    • searcher: /checks
    • TBD

How can an admin access the health checks?

In the first version, admins can access the frontend container and curl the aggregate endpoint on the debug port :6060. In the future we might want to expose the data via the GraphQL API and make it consumable from the site admin interface.

Test Plan

sg start
curl http://localhost:3090/.internal/checks -sS | jq
curl http://localhost:6063/aggregate-checks -sS | jq 

Merge request reports

Loading