Skip to content

Find references to common Java class method name without false-positives

Created by: slimsag

(If you already know the state of code intel today, and understand the use case here, skip to the last section for what this issue is)

Code intelligence today

Today, Sourcegraph sports two types of code intelligence:

  • Basic code intelligence: Provided by smart / language aware regex-search operations.
    • Pros: Very fast, works well in like 70% of cases.
    • Cons: Can turn up false-positive matches when e.g. doing 'Find references' (the other 30% of cases).
  • Precise code intelligence: Supported by a separate language server, acting as e.g. a compiler would.
    • Pros: Does type-checking of code as a compiler would, results are accurate in 100% of cases (effectively).
    • Cons: Very resource hungry, and very slow in general (effectively waiting for the project to compile before seeing results).

Sourcegraph by default comes with basic code intelligence easily accessible, and precise code intelligence requiring additional setup / configuration (must deploy language server, secure it behind auth proxy, etc.)

A common use case

A very common user story is asking Sourcegraph a question like:

Where are all of the callers of my deprecated method MyClass.foobar()?

Both basic and precise code intelligence can answer this question via Find references, but each with different drawbacks:

  • basic code intelligence:
    • Provides many false positives (if foobar is a common method name, basic code intelligence is not class-aware and does not know that e.g. myClassInstance.foobar() is of type MyClass and therefor simply finds all results where a foobar method is invoked).
    • Many false positives are too vast when searching over thousands of repositories.
  • precise code intelligence:
    • We don't currently have Java precise code intelligence.
    • Requires extra setup on the instance administrator's part.
    • Requires significantly more resources, is slow, requires additional configuration (e.g. even repo-level configuration to explain to language server how to build project).
    • Deploying this is a heavy burden both for site admins and a user who asks their site admin to do it.

Both approaches can solve the problem, but each have significant drawbacks.

This issue

This issue is merely to publicly track that this is a problem we want to solve.

Possible ways we may solve the problem include:

  • An experiment we may try with basic code intel in which we index all tokens and their corresponding type in order to make find refs more accurate (perhaps using tree-sitter).

Possible ways we would mitigate the issue:

  • Add Java precise code intelligence, then have the option to choose between basic and precise references when you execute a Find references action.

None of the above are concrete and we (currently) do not have a concrete timeline for this as it is a very difficult problem to solve. We'll provide updates on this issue as we learn more.

Customers: https://app.hubspot.com/contacts/2762526/company/407948923