prevent Google from indexing useless/garbage pages
Created by: slimsag
Google today is indexing a large number of pages on Sourcegraph that I believe are not useful for developers. Some choice picks:
-
Files, directories, and repositories at non-default-branch commits:
- Rationale: It is harmful to lead someone to a specific commit or branch, most often people are searching for a specific file and hoping to find the latest revision. It is harmful to find an old revision or branch. Only the default branch should be indexed.
- https://sourcegraph.com/github.com/ruby/ruby@171803d5d34feb1b4244ca81b9db0a7bc2171c85/-/blob/doc/NEWS-1.9.3
- https://sourcegraph.com/github.com/go-pg/pg@v10/-/blob/main_test.go
-
Individual commit pages:
- Rationale: Most repositories have thousands or hundreds of thousands of commits, with 7 million repositories we're looking at many billions of commits with no signal for Google and others about which ones are actually interesting to developers or not. This ends up basically being spam.
- https://sourcegraph.com/github.com/cenkalti/log/-/commit/0687910b77b97d3830dc2e0a3ee702f7d2d1e4c1