Skip to content

codeintel: Process LSIF with constant memory

Created by: efritz

Currently we require linear (or superlinear) memory to process a single LSIF dump. We should find a way to be able to process larger indexes in multiple passes, such that intermediate results can be serialized to disk. We currently have enterprise customers unable to continually up the memory allocation for precise-code-intel-worker pods.

Basic ideas here include:

  • Multiple passes over the LSIF input to serialize each document or sets of documents (requires spilling result chunk data to an external temporary storage)
  • Change the data backend to normalize data (at the risk of blowing up the size in Postgres as we'd no longer share data between documents)