Skip to content

codeintel: Refactor lsif input parsing

Administrator requested to merge refactor-lsif-parsing into master

Created by: efritz

Previously: The lsif package contained functions for parsing all of the input JSON lines. This included a ParseElement function which would parse the outermost common fields of all vertexes and edges (id, label, type) and carry with it the raw data for subsequent parsing of a specific type. This required that the correlator call another parse method on the raw values again during correlation, making the correlator partially responsible for knowing the format of the input data.

We'd like to break this so that we can switch out JSON lines for a binary format for smaller transfers, less disk usage, and faster parsing.

Now: The lsif package contains bare types and we've introduced a new package lsif/jsonlines. This introduces a new Reader function that will return a channel on which fully parsed items are passed to the correlator. The correlator now only assumes the type in the form of a checked cast instead of calling back into the parsing layer for a more refined type of data.

This also gives us a way to speed up parsing JSON lines by parsing chunks concurrently, or parsing a lookahead buffer with multiple worker threads (unimplemented here, but is intended as a POC shortly after this is merged).

Merge request reports

Loading