Add LSIF support
Created by: chrismwendt
Goals
The goal of this PR is to enable us in the Sourcegraph organization to upload LSIF data for the sourcegraph/sourcegraph repository to Sourcegraph.com.
The next goal (not for this PR) is to support other users uploading LSIF for their own repositories by GopherCon (July 24).
The kind of feedback that would be most helpful would be: pointing out problems with the architecture or usability. Here are some areas I'm looking for feedback on:
- Auth: I plan to restrict the
/upload
endpoint to admin access tokens, and keep the/request
endpoint open to normal users (or anonymous users, if the instance is public) cc @sourcegraph/core-services - API: this currently resides at
.api/lsif
, but I plan to move it under GraphQL cc @sourcegraph/core-services - The LSIF HTTP server is Node.js: are we OK with running a Node.js process on the backend? I'm reusing a few files from https://github.com/microsoft/vscode-lsif-extension/tree/master/server (and asked about an npm package https://github.com/microsoft/vscode-lsif-extension/issues/13), but they could probably get rewritten in Go in ~1 week cc @beyang @felixfbecker @lguychard
- General architecture and anything I'm overlooking (file storage, caching, API, etc.)
Background https://github.com/sourcegraph/sourcegraph/issues/4692
Overview of implementation see lsif/README.md and lsif/server/README.md
Changelog Doesn't update the CHANGELOG because there's no LSIF-scope for access tokens yet, which is OK for an unannounced MVP that we try out for ourselves on the sourcegraph/sourcegraph repository https://sourcegraph.slack.com/archives/C0C324C91/p1562109046016800?thread_ts=1562105156.005000&cid=C0C324C91
Test plan
- cd into
sourcegraph/codeintellify
lsif-tsc -p tsconfig.json --noContents --out data.lsif
-
env SRC_ENDPOINT=http://localhost:3080 SRC_ACCESS_TOKEN=$DEV_SRC_ACCESS_TOKEN REPOSITORY=github.com/sourcegraph/codeintellify COMMIT=$(git rev-parse HEAD) bash ~/path/to/sourcegraph/sourcegraph/lsif/upload.sh data.lsif
(currently fails at the upload step because I haven't figured out how to implement auth yet) - Make sure it says "Upload successful."
- TODO add LSIF to each language extension (basic-code-intel can be a vehicle for this)
- Open http://localhost:3080/github.com/sourcegraph/codeintellify/-/blob/src/hoverifier.ts
- You should see hovers and defs working
TODO
-
Check if LSIF data for a given repo@commit /exists
before sending hover/def/ref requests -
Check the version number during upload, return 400 unless it matches the version that the LSIF DB code expects -
Restrict the /upload
endpoint to admins (in the future, we might create a new LSIF scope for access tokens) -
Write a Dockerfile, add it to gen-pipeline.go -
Address TODOs in code, refactor
Potential future ideas (we can move select tasks into the TODOs above)
- Translate repository name to repository ID
- Add metrics to track upload rate, upload size, request rate, size of
Database
cache, etc. - Add throttling for uploads and requests
- Restrict uploads per-repository (see discussion about this https://sourcegraph.slack.com/archives/C0C324C91/p1562105156005000)
- Add a new kind of token (different from user access tokens): LSIF upload token
- Support fetching LSIF data from external storage (e.g. Amazon S3, like Codecov does)
- Support multiple replicas of this LSIF server and figure out storage and load balancing
- Fall back to the most recent commit that does have LSIF data if the current one doesn't
- Infer repository, commit, etc. from build environment (Travis, CircleCI, Buildkite, etc.) like Codecov does
Merge request reports
Activity
Created by: tsenart
API: this currently resides at .api/lsif, but I plan to move it under GraphQL cc @sourcegraph/core-services
What's the rationale for wanting to move it to GraphQL?
Auth: I plan to restrict the /upload endpoint to admin access tokens, and keep the /request endpoint open to normal users (or anonymous users, if the instance is public) cc @sourcegraph/core-services
A separate scope would be great. Perhaps we could track that in a separate issue.
Created by: felixfbecker
API
Could you document what the endpoints accept? It's currently not clear to me what a request body needs to look like. What does "file upload" mean? What's the schema? For what is LSIF content uploaded and requested, a repo, a file, a package, ...?
It also seems these are RPC ("request" and "upload" are both verbs), which seems like a missed opportunity here to make it more RESTful. It's valuable to follow HTTP semantics as much as possible (e.g. for easy caching), and it seems easy here. Why not make it something like
POST /.api/lsif/repo@rev/file
andGET /.api/lsif/repo@rev/file
?Node
The LSIF HTTP server is Node.js: are we OK with running a Node.js process on the backend?
We already have a Node microservice for badges, and I think it makes a lot of sense to use it here, given that Microsoft's LSIF stuff will be in TS. Everything else of code-nav is in TS too.
Metrics
Add metrics to track upload rate, upload size, request rate, size of Database cache, etc.
I think this is very important. It's easy to do in Node, take a look at sourcegraph-typescript for an example (uses Prometheus, OpenTracing & custom debug endpoints).
Storage
What's the thinking behind storing the LSIF contents as files? How will we want to query it? Files seem obvious if we just want to echo the uploaded raw data, but in that case is this server just a custom-built S3? If there is any querying we want to support, we should probably save it in a database that allows us to efficiently do those queries
GraphQL
Since LSIF is a graph GraphQL seems like a good choice if we want to support specific querying on it.
Microsoft's package
Did you ask them to publish it? It seems like it's a VS Code extension so they probably won't do it unless we ask them. Note the files have "See License.txt in the project root for license information." in the header and are licensed under a different license than this repo so I don't think you can copy them like this PR does.
Code
I find the code hard to review rn, could you add inline documentation? I have more feedback but I'd like to avoid back and fourth that is caused by potential misunderstandings.
Meta
It would be nice to have this discussion in a design document instead of on a pull request for the implementation
Created by: chrismwendt
@tsenart I'm guessing it's easier to apply permissions to endpoints in GraphQL than
.api/lsif
, but where would you put this?I added a section for "Post-merge TODO" and added the LSIF scope idea to it.
@felixfbecker Great points on the design/architecture!
And to a Google doc to discuss this further: https://docs.google.com/document/d/1W0L0Z34Og7j0WRHEesxs_92I0vOmy-eC0BvbOVzFGww/edit# I'll be responding and adding my own notes there, too.Created by: codecov[bot]
Codecov Report
Merging #4799 into master will increase coverage by
0.02%
. The diff coverage is30%
.@@ Coverage Diff @@ ## master #4799 +/- ## ========================================== + Coverage 48.02% 48.04% +0.02% ========================================== Files 728 729 +1 Lines 44444 44464 +20 Branches 1763 1763 ========================================== + Hits 21343 21364 +21 + Misses 21128 21124 -4 - Partials 1973 1976 +3
Impacted Files Coverage Δ cmd/server/shared/shared.go 0% <0%> (ø)
cmd/frontend/internal/httpapi/lsif.go 18.18% <18.18%> (ø)
cmd/frontend/internal/httpapi/httpapi.go 22.22% <57.14%> (+2.94%)
cmd/frontend/graphqlbackend/repository.go 22.05% <0%> (-0.99%)
...d/frontend/internal/authz/bitbucketserver/store.go 80.83% <0%> (+1.19%)
cmd/frontend/graphqlbackend/graphqlbackend.go 53.14% <0%> (+10.48%)