Skip to content
Snippets Groups Projects

Add LSIF support

Created by: chrismwendt

Goals

The goal of this PR is to enable us in the Sourcegraph organization to upload LSIF data for the sourcegraph/sourcegraph repository to Sourcegraph.com.

The next goal (not for this PR) is to support other users uploading LSIF for their own repositories by GopherCon (July 24).

The kind of feedback that would be most helpful would be: pointing out problems with the architecture or usability. Here are some areas I'm looking for feedback on:

  • Auth: I plan to restrict the /upload endpoint to admin access tokens, and keep the /request endpoint open to normal users (or anonymous users, if the instance is public) cc @sourcegraph/core-services
  • API: this currently resides at .api/lsif, but I plan to move it under GraphQL cc @sourcegraph/core-services
  • The LSIF HTTP server is Node.js: are we OK with running a Node.js process on the backend? I'm reusing a few files from https://github.com/microsoft/vscode-lsif-extension/tree/master/server (and asked about an npm package https://github.com/microsoft/vscode-lsif-extension/issues/13), but they could probably get rewritten in Go in ~1 week cc @beyang @felixfbecker @lguychard
  • General architecture and anything I'm overlooking (file storage, caching, API, etc.)

Background https://github.com/sourcegraph/sourcegraph/issues/4692

Overview of implementation see lsif/README.md and lsif/server/README.md

Changelog Doesn't update the CHANGELOG because there's no LSIF-scope for access tokens yet, which is OK for an unannounced MVP that we try out for ourselves on the sourcegraph/sourcegraph repository https://sourcegraph.slack.com/archives/C0C324C91/p1562109046016800?thread_ts=1562105156.005000&cid=C0C324C91

Test plan

  • cd into sourcegraph/codeintellify
  • lsif-tsc -p tsconfig.json --noContents --out data.lsif
  • env SRC_ENDPOINT=http://localhost:3080 SRC_ACCESS_TOKEN=$DEV_SRC_ACCESS_TOKEN REPOSITORY=github.com/sourcegraph/codeintellify COMMIT=$(git rev-parse HEAD) bash ~/path/to/sourcegraph/sourcegraph/lsif/upload.sh data.lsif (currently fails at the upload step because I haven't figured out how to implement auth yet)
  • Make sure it says "Upload successful."
  • TODO add LSIF to each language extension (basic-code-intel can be a vehicle for this)
  • Open http://localhost:3080/github.com/sourcegraph/codeintellify/-/blob/src/hoverifier.ts
  • You should see hovers and defs working

TODO

  • Check if LSIF data for a given repo@commit /exists before sending hover/def/ref requests
  • Check the version number during upload, return 400 unless it matches the version that the LSIF DB code expects
  • Restrict the /upload endpoint to admins (in the future, we might create a new LSIF scope for access tokens)
  • Write a Dockerfile, add it to gen-pipeline.go
  • Address TODOs in code, refactor

Potential future ideas (we can move select tasks into the TODOs above)

  • Translate repository name to repository ID
  • Add metrics to track upload rate, upload size, request rate, size of Database cache, etc.
  • Add throttling for uploads and requests
  • Restrict uploads per-repository (see discussion about this https://sourcegraph.slack.com/archives/C0C324C91/p1562105156005000)
  • Add a new kind of token (different from user access tokens): LSIF upload token
  • Support fetching LSIF data from external storage (e.g. Amazon S3, like Codecov does)
  • Support multiple replicas of this LSIF server and figure out storage and load balancing
  • Fall back to the most recent commit that does have LSIF data if the current one doesn't
  • Infer repository, commit, etc. from build environment (Travis, CircleCI, Buildkite, etc.) like Codecov does

Merge request reports

Approval is optional

Merged by avatar (Aug 23, 2025 10:58am UTC)

Merge details

  • Changes merged into master with dffc6d43.
  • Deleted the source branch.

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Created by: tsenart

    API: this currently resides at .api/lsif, but I plan to move it under GraphQL cc @sourcegraph/core-services

    What's the rationale for wanting to move it to GraphQL?

    Auth: I plan to restrict the /upload endpoint to admin access tokens, and keep the /request endpoint open to normal users (or anonymous users, if the instance is public) cc @sourcegraph/core-services

    A separate scope would be great. Perhaps we could track that in a separate issue.

  • Created by: felixfbecker

    API

    Could you document what the endpoints accept? It's currently not clear to me what a request body needs to look like. What does "file upload" mean? What's the schema? For what is LSIF content uploaded and requested, a repo, a file, a package, ...?

    It also seems these are RPC ("request" and "upload" are both verbs), which seems like a missed opportunity here to make it more RESTful. It's valuable to follow HTTP semantics as much as possible (e.g. for easy caching), and it seems easy here. Why not make it something like POST /.api/lsif/repo@rev/file and GET /.api/lsif/repo@rev/file?

    Node

    The LSIF HTTP server is Node.js: are we OK with running a Node.js process on the backend?

    We already have a Node microservice for badges, and I think it makes a lot of sense to use it here, given that Microsoft's LSIF stuff will be in TS. Everything else of code-nav is in TS too.

    Metrics

    Add metrics to track upload rate, upload size, request rate, size of Database cache, etc.

    I think this is very important. It's easy to do in Node, take a look at sourcegraph-typescript for an example (uses Prometheus, OpenTracing & custom debug endpoints).

    Storage

    What's the thinking behind storing the LSIF contents as files? How will we want to query it? Files seem obvious if we just want to echo the uploaded raw data, but in that case is this server just a custom-built S3? If there is any querying we want to support, we should probably save it in a database that allows us to efficiently do those queries

    GraphQL

    Since LSIF is a graph GraphQL seems like a good choice if we want to support specific querying on it.

    Microsoft's package

    Did you ask them to publish it? It seems like it's a VS Code extension so they probably won't do it unless we ask them. Note the files have "See License.txt in the project root for license information." in the header and are licensed under a different license than this repo so I don't think you can copy them like this PR does.

    Code

    I find the code hard to review rn, could you add inline documentation? I have more feedback but I'd like to avoid back and fourth that is caused by potential misunderstandings.

    Meta

    It would be nice to have this discussion in a design document instead of on a pull request for the implementation

  • Created by: chrismwendt

    @tsenart I'm guessing it's easier to apply permissions to endpoints in GraphQL than .api/lsif, but where would you put this?

    I added a section for "Post-merge TODO" and added the LSIF scope idea to it.

    @felixfbecker Great points on the design/architecture! :sparkles: And :thumbsup: to a Google doc to discuss this further: https://docs.google.com/document/d/1W0L0Z34Og7j0WRHEesxs_92I0vOmy-eC0BvbOVzFGww/edit# I'll be responding and adding my own notes there, too.

  • Created by: chrismwendt

    The http-server.ts is ready for review :sparkles:

    I'm happy to do a video chat :movie_camera: in case there are more than a handful of comments (cc @felixfbecker)

  • Created by: felixfbecker

    Please make sure the code is checked by our TSLint, ESLint and Prettier :)

  • Created by: chrismwendt

    Could I get review on these files? :pencil:

    • cmd/ dir
    • dev/{Procfile,launch.sh}
    • enterprise/dev/Procfile
    • lsif/server/src/{Dockerfile,build.sh}
  • Created by: chrismwendt

    @slimsag Thanks, I applied all of your suggestions :thumbsup:

  • Created by: codecov[bot]

    Codecov Report

    Merging #4799 into master will increase coverage by 0.02%. The diff coverage is 30%.

    @@            Coverage Diff             @@
    ##           master    #4799      +/-   ##
    ==========================================
    + Coverage   48.02%   48.04%   +0.02%     
    ==========================================
      Files         728      729       +1     
      Lines       44444    44464      +20     
      Branches     1763     1763              
    ==========================================
    + Hits        21343    21364      +21     
    + Misses      21128    21124       -4     
    - Partials     1973     1976       +3
    Impacted Files Coverage Δ
    cmd/server/shared/shared.go 0% <0%> (ø) :arrow_up:
    cmd/frontend/internal/httpapi/lsif.go 18.18% <18.18%> (ø)
    cmd/frontend/internal/httpapi/httpapi.go 22.22% <57.14%> (+2.94%) :arrow_up:
    cmd/frontend/graphqlbackend/repository.go 22.05% <0%> (-0.99%) :arrow_down:
    ...d/frontend/internal/authz/bitbucketserver/store.go 80.83% <0%> (+1.19%) :arrow_up:
    cmd/frontend/graphqlbackend/graphqlbackend.go 53.14% <0%> (+10.48%) :arrow_up:
  • Created by: chrismwendt

    @felixfbecker @lguychard Any more feedback on the TypeScript code blocking approval?

  • Created by: chrismwendt

    Thanks for the reviews, @felixfbecker I applied all of your suggestions :thumbsup:

Please register or sign in to reply
Loading