Skip to content

security: remove our CSRF tokens to improve security and reduce complexity of our threat model

Warren Gifford requested to merge sg/remove-csrf-tokens into main

Created by: slimsag

Prior to this change, we protected against CSRF attacks two ways:

  1. (primary method) browser's CORS policies, i.e. setting the right CORS headers, ensuring we only allow session/cookie-based authentication in requests from trusted origins, etc.
  2. (secondary method) CSRF tokens/cookies/headers.

After this change, we rely solely on browser's CORS policies(1) to prevent against CSRF attacks.

But aren't CSRF tokens a good thing?

The problem with using CSRF tokens is that it increases the code complexity and mental model complexity of our CSRF threat model, making it harder to reason about. For a relatively clear picture of this complexity, see the changes to doc/dev/security/csrf_security_model.md.

Additionally, our CSRF token implementation was not good. In fact, it was very bad:

  • In the past, when Sourcegraph went open source we found we had a hard-coded CSRF cookie encryption key - this turned out not to be a vulnerability because we relied on browser's CORS policies as our primary method of protection.
  • There is evidence to suggest that due to the inclusion of CSRF tokens in window.context, that if we had relied on them anyway we would have been subject to caching vulnerabilities in which one user would get another user's CSRF token due to GET requests being cached.
  • There have been multiple instances of users getting locked out of their Sourcegraph instance in the past due to the aforementioned caching issue, leading to site admins clearing their Redis store and accidentally wiping out session<->UserID association linkages between Postgres and Redis - leading to a real security incidents in which users were incorrectly authenticated as another user. We took steps to address this previously, so I don't think it is possible today, but it's another example of how the complexity of our CSRF tokens led to hard-to-reason-about behavior which directly led to actions resulting in real user issues and security incidents.

In short, our CSRF token implementation has always been very bad, we do not rely on it today, we have not relied on it in the past 4+ years, it has verifiably caused real security incidents for customers and Sourcegraph.com in the past, and it increases the hurdles to engineers attaining a solid understanding of our CSRF threat model.

What about defense in depth?

DID is a reasonable strategy. I would argue that reducing security complexity so that engineers can reason about our threat model is more important than DID, and so removal is the right move.

One could argue having both CSRF tokens and strong CORS policies is the best, most ideal state to be in. I agree with that, but I will also explicitly call out that our implementation today is absolutely not it: If we do want DID, we need to re-visit the entire implementation of CSRF tokens from scratch and not rely on the garbage, ill-thought-out code we have here.

5 years in the making

This also finally closes the issue I filed back in 2020, https://github.com/sourcegraph/sourcegraph/issues/7658, in which I recommended the same thing. This was also recommended by myself and Quinn as early as 2018 (see issue for more historical context.)

Without further ado: let's finally get rid of these!

Merge request reports

Loading