Skip to content

usage-data: maintain state of progress when scraping events

Administrator requested to merge usage-data/stateful-scrape into main

Created by: coury-clark

Closes https://github.com/sourcegraph/sourcegraph/issues/39089

Capture state of scraping jobs progress. Adds a new table event_logs_scrape_state that will record the bookmark of the highest event_id that was successfully sent. If no state is found a new state row will be initialized at the current event. Older events will not be backfilled.

Test plan

To test locally:

Set up gcp credentials

gcloud auth application-default login

Start sg

GOOGLE_APPLICATION_CREDENTIALS="$HOME/.config/gcloud/application_default_credentials.json" sg start ...

Add configuration to the site config

    "exportUsageTelemetry": {
      "enabled": true,
      "topicProjectName": "sourcegraph-dogfood",
      "topicName": "usage-data-testing"
    },

Log lines will indicate job progress

[         worker] INFO worker.export-usage-telemetry telemetry/telemetry_job.go:127 fetching events from bookmark {"bookmark_id": 21333}
[         worker] INFO worker.export-usage-telemetry telemetry/telemetry_job.go:138 telemetryHandler executed {"event count": 5, "maxId": 21338}

Check bookmark table

select bookmark_id from event_logs_scrape_state order by id limit 1;

Merge request reports

Loading