Skip to content
Snippets Groups Projects

prometheus: alert dashboard links with fixed timestamps

Merged Warren Gifford requested to merge prom/link-to-fixed-time into main

Created by: bobheadxi

While working on https://github.com/sourcegraph/sourcegraph/pull/17014 I added a relative timestamp to the dashboard link in alerts, did a bit of fenangling to make the link completely fixed to timestamps associated with the delivered alert.

We can depend on (index .Alerts 0) because our grouping strategy ensures each group delivered only has one alert.

I'm still a bit unsure about this now that I've got it working, the experience is a bit less than ideal for alerts that e.g. are spikes lasting a second, since then we get a link to a panel that has a tiny window. An alternative is time and time.window, but that might not be great for alerts lasting longer. It is not possible to do arithmetic on these timestamps in alertmanager templates: https://github.com/prometheus/alertmanager/issues/1188

=> update: see https://github.com/sourcegraph/sourcegraph/pull/17034#issuecomment-756598154

Merge request reports

Merged by avatar (Jun 23, 2025 4:24am UTC)

Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Created by: sourcegraph-bot

    Notifying subscribers in CODENOTIFY files for diff 1851376163f4158e3458a73d0c222a3f6ddce5d0...ffa11b5105aeebdb76481bdb36f41a0769bca2b1.

    No notifications.

  • Created by: codecov[bot]

    Codecov Report

    Merging #17034 (ffa11b5) into main (1851376) will decrease coverage by 0.00%. The diff coverage is 100.00%.

    @@            Coverage Diff             @@
    ##             main   #17034      +/-   ##
    ==========================================
    - Coverage   51.98%   51.98%   -0.01%     
    ==========================================
      Files        1703     1703              
      Lines       84786    84788       +2     
      Branches     7524     7666     +142     
    ==========================================
    - Hits        44079    44077       -2     
    - Misses      36806    36808       +2     
    - Partials     3901     3903       +2     
    Flag Coverage Δ
    go 51.02% <100.00%> (-0.01%) :arrow_down:
    integration 30.54% <ø> (ø)
    storybook 30.03% <ø> (ø)
    typescript 54.30% <ø> (ø)
    unit 34.80% <ø> (ø)
    Impacted Files Coverage Δ
    ...er-images/prometheus/cmd/prom-wrapper/receivers.go 66.84% <100.00%> (+0.35%) :arrow_up:
    .../internal/codeintel/resolvers/graphql/locations.go 79.38% <0.00%> (-4.13%) :arrow_down:
  • Created by: pecigonzalo

    @bobheadxi You can use something like &time=1609931477000&time.window=3600000 instead, check https://grafana.com/docs/grafana/latest/dashboards/time-range-controls/#control-the-time-range-using-a-url

  • Created by: bobheadxi

    I considered that (see PR description):

    An alternative is time and time.window, but that might not be great for alerts lasting longer.

    It's not a great experience if the alert spans, say, 24 hours of problems (the link would only show a few minutes or whatever value we set there)

  • Created by: pecigonzalo

    I see, I did not notice that in the body my bad. I think defaulting to 1h or inferring time.window from something like the alert period * 1.5 should be ok.

    Eg if CPU utilization alerts when its > 50 for 5m we link to a dashboard of time of alert and time.window of 7.5m

  • Created by: bobheadxi

    inferring time.window from something like the alert period * 1.5 should be ok.

    Unfortunately you cannot do arithmetic in alertmanager templates :/ However, I think I found a nice middle-ground strategy with e3f166ff1be18a491b39dce230c03d1c70cd1cb2:

    • If start and end available, link to a fixed timeframe on the start and end
    • If end is not available (alert still active), link to start and window of 1 hour
  • Created by: pecigonzalo

    Unfortunately you cannot do arithmetic in alertmanager templates

    But couldnt we set that in the generated alert file when its generated by the generator?

  • Created by: bobheadxi

    But couldnt we set that in the generated alert file when its generated by the generator?

    We can't infer period * 1.5 for example (assuming by period you mean the time the alert is active - or you mean the for parameter? these are often just 0 or a small number)

  • Created by: pecigonzalo

    We can't infer period * 1.5 for example (assuming by period you mean the time the alert is active - or you mean the for parameter? these are often just 0 or a small number)

    I meant the for parameter. To be honest, I think just using 1h for the link is also fine. /shrug

  • Created by: bobheadxi

    Merging for now, can see how it works in practice :) this will be nice for looking at past alerts

Please register or sign in to reply
Loading