Implement event logging in Sourcegraph for admins
Created by: dadlerj
This project is copied from https://github.com/sourcegraph/sourcegraph/issues/3583#issuecomment-521432491
See the bottom for an update on concrete next steps to consider this project complete.
Thanks for all of the feedback, everyone!
This is not going to be as pretty as the project scoping docs that Product usually puts together, so please let me know where more info would be helpful!
Background/problems
From my point of view, this project is about building a common backend that supports multiple frontend products/use cases, not all of which are needed today.
The problems:
-
The current site usage statistics page is built on an extremely brittle and un-extensible redis-based backend. Adding a new metric almost always requires modifying/migrating the redis data structure (and all of the stored data), or adding an entirely new one.
-
As a result, capturing any new metrics, or adding any features based on user action aggregation (such as https://github.com/sourcegraph/sourcegraph/issues/3333 which I added a couple iterations ago, https://github.com/sourcegraph/sourcegraph/issues/2069 which Beyang added a few iterations ago, @vanesa's work on https://github.com/sourcegraph/sourcegraph/issues/2348, soon https://github.com/sourcegraph/sourcegraph/issues/5088, and any future work on audit logs) requires:
- Adding a new storage backend, new backend event handlers, and new data aggregators (see the churn in the
usagestats
package). - Deciding on a data structure, writing new tests, testing for performance at massive scale.
- Adding the new frontend code to custom handle the new event(s).
- Adding a new storage backend, new backend event handlers, and new data aggregators (see the churn in the
Definition of success/Scope
The concrete outcome I'd like to reach is:
- "All" frontend user actions are logged inside of the Sourcegraph instance. This includes everything from "user viewed a page", to "user clicked a button", to "user hovered over a symbol" — along the lines of a Google Analytics, or what we use BigQuery/Looker for for Sourcegraph.com metrics. These user actions (or "events") that are logged can accept an arbitrary text (or json?) argument.
- "All" is quoted above to indicate that it's not really all events :) — rather, only the things we care about would actually be logged. See all calls to
eventLogger.log
in the frontend code to get a sense for the scale here - All of these events would pass through a common GraphQL endpoint and event handler on the backend that would add them into the new database.
- All of these events are stored in a backend database containing:
-
name
: the name of the event -
argument
(or whatever): the string argument -
url
: the URL on the page when the event was logged -
userID
: the user that did the action -
anonymousUserID
: a UUID stored in a cookie that we use as a way to aggregate actions by anonymous users (when the instance has"auth.public": true
set). This is already generated and used in the existing usagestats package. -
timestamp
: the timestamp of the action
-
- From this raw database we could easily aggregate data as needed to generate each of the features/products described above, including:
- Site usage statistics — e.g. a count of monthly active users would be as simple as:
SELECT COUNT(DISTINCT user_id) FROM events WHERE timestamp > $start_of_month AND timestamp < $end_of_month
- Extension usage (@vanesa's project) — a count of unique users by extension would be as simple as
SELECT user_id, argument FROM events WHERE name='ExtensionActivated'
to get the list, then loop over each record to build the list. - Audit logs — e.g., a list of actions that match some list of "audit loggable" events, such as
SELECT name, user_id, timestamp FROM events WHERE name IN ('SettingsChanged', 'ConfigChanged', 'RepoViewed', ...)
- Site usage statistics — e.g. a count of monthly active users would be as simple as:
Out of scope
Note that actually creating the features/products listed aboved is NOT part of the definition of success — rather, once the data store is available we would begin to add/migrate these.
Any thoughts?
Now that @unknwon's perf testing results are complete, my take is that the following is required from here:
-
Actually implement the db table in Postgres (as well as any supporting service infrastructure, such as the gc to dump data after 93 days, etc.) -
Begin INSERTing events as they come in — specifically, the GraphQL handler should add an extra call to insert the event at https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/cmd/frontend/graphqlbackend/user_usage_stats.go#L68:12, while leaving the existing storage in place as well (they will run in parallel for a short time). -
Update the GraphQL API call (defined at https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/cmd/frontend/graphqlbackend/schema.graphql#L268:5) to accept (1) arbitrary event names, (2) a url
parameter, and (3) anargument
parameter. -
We currently heavily filter which frontend events get sent back through the GraphQL API in https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/blob/web/src/tracking/services/serverAdminWrapper.tsx#L5:7. Going forward, both trackPageView
andtrackAction
should calllogUserEvent(eventAction)
(instead of only doing so on certain types of events, and with modified event names). Pass the current URL through as theurl
, and nil/empty string as theargument
-
Lastly, I'd recommend adding a feature flag to disable (or enable?) this functionality.
And I think that's all I have. @nicksnyder any thoughts?