Code intelligence on Stackoverflow code snippets
Created by: nicksnyder
Feature request description
When browsing a Stackoverflow page like https://stackoverflow.com/questions/1760757 it would be nice to have code intelligence on snippets in the question and answers via our browser extensions. Code intelligence would only work on snippets where (1) we can detect the language without a file extension hint and (2) are well formed enough for code intelligence to work (e.g. a self contained program).
Proposed implementation
Given our current architecture, we can only provide code intelligence on code that exists in a Git repository. The proposal here is to build a Git proxy service than transforms a page on Stackoverflow to a git repository. This service will be hosted by Sourcegraph (e.g. on git.sourcegraph.com).
Using https://stackoverflow.com/questions/1760757 as an example, here is a hypothetical sketch of what should be possible:
$ git clone git.sourcegraph.com/stackoverflow.com/questions/1760757
$ cd 1760757
$ cat 1766304-1 # the first snippet in the answer with id 1766304
package main
import (
"bytes"
"fmt"
)
func main() {
var buffer bytes.Buffer
for i := 0; i < 1000; i++ {
buffer.WriteString("a")
}
fmt.Println(buffer.String())
}
$ cat 23857998-1 # the first snippet in the answer with id 23857998
BenchmarkConcat 1000000 64497 ns/op 502018 B/op 0 allocs/op
BenchmarkBuffer 100000000 15.5 ns/op 2 B/op 0 allocs/op
BenchmarkCopy 500000000 5.39 ns/op 0 B/op 0 allocs/op
$ cat 23857998-2 # the second snippet in the answer with id 23857998
package main
import (
"bytes"
"strings"
"testing"
)
func BenchmarkConcat(b *testing.B) {
var str string
for n := 0; n < b.N; n++ {
str += "x"
}
b.StopTimer()
if s := strings.Repeat("x", b.N); str != s {
b.Errorf("unexpected result; got=%s, want=%s", str, s)
}
}
func BenchmarkBuffer(b *testing.B) {
var buffer bytes.Buffer
for n := 0; n < b.N; n++ {
buffer.WriteString("x")
}
b.StopTimer()
if s := strings.Repeat("x", b.N); buffer.String() != s {
b.Errorf("unexpected result; got=%s, want=%s", buffer.String(), s)
}
}
func BenchmarkCopy(b *testing.B) {
bs := make([]byte, b.N)
bl := 0
b.ResetTimer()
for n := 0; n < b.N; n++ {
bl += copy(bs[bl:], "x")
}
b.StopTimer()
if s := strings.Repeat("x", b.N); string(bs) != s {
b.Errorf("unexpected result; got=%s, want=%s", string(bs), s)
}
}
// Go 1.10
func BenchmarkStringBuilder(b *testing.B) {
var strBuilder strings.Builder
b.ResetTimer()
for n := 0; n < b.N; n++ {
strBuilder.WriteString("x")
}
b.StopTimer()
if s := strings.Repeat("x", b.N); strBuilder.String() != s {
b.Errorf("unexpected result; got=%s, want=%s", strBuilder.String(), s)
}
}
$ git fetch # fetch the latest version of the files (in case answers have been updated).
The generated repository doesn't need to have a commit history, it can just be a single commit of the current content of the site. If the content of the site changes, then the proxy will create a new git repo with a single commit (as if the history was force pushed).
The commit should have a git tag pointing to it that encodes the timestamp of the most recent update to the code snippets on the page. In the case of Stackoverflow max(question created timestamp, question edited timestamp, answer created timestamp, answer edited timestamp)
(e.g. 2018-08-28_08-24-36
)
The first goal is for Stackoverflow, but the implementation should be designed in a way that makes it easy to add support for other sites in the future.
Alternatives considered
- Put code snippets into Gists and then perform code intelligence on those gists. We would need to write some service that translates a page (like Stackoverflow) to code files and then uploads them as a Gist. Instead of doing the latter it seems simpler to just ephemerally host the git repo directly from this service.
- Modify our architecture to remove the requirement that code exists in a git repository. This would either require adding code paths everywhere that allow passing file contents directly instead of a git repo (which would be an invasive change), or create a Sourcegraph API that creates temporary git repos only on gitserver (that isn't hosted anywhere else). We still need to write webpage -> git repo logic and add special cases, so the proposed solution seems better.