redis-cache segfault in single server dogfood instance
Created by: uwedeportivo
see https://github.com/sourcegraph/infrastructure/pull/2053
we experienced it today with sourcegraph.sgdev.org.
switching back to 3.18 fixes it.
things i tried:
- wiping /var/opt/sourcegraph/redis-cache (the data dir). that made it come up for a bit but ended up in the same segfault
- putting in a dummy endpoint and sh-ing into the container and launching redis-server redis-cache.conf. that works but the caveat here is that it didn't get exposed to traffic
- increased logging to debug level in the conf template https://github.com/sourcegraph/sourcegraph/blob/master/cmd/server/shared/assets/redis-cache.conf.tmpl#L16 and took out the tail -1 https://github.com/sourcegraph/sourcegraph/blob/master/cmd/server/shared/redis.go#L94 but logging didn't reveal anything other than it passes beyong initialization and conf reading
- since https://pkgs.alpinelinux.org/packages?name=redis&branch=v3.10 recommends 5.0.5, i downgraded 5.0.9 to 5.0.5 with no effect, still segfaults
- @ggilmore set up dogfood-redis-test cluster in sourcegraph-aux but couldn't reproduce it there
- @davejrt used his minikube vm and couldn't reproduce it there
- we all tried locally and it came up, no segfault
- read through https://github.com/sourcegraph/sourcegraph/issues/651
- there are core dumps in /var/opt/sourcegraph/redis-cache-backup. i pointed gdb to one core dump and the function name i got is
/* Returns an array of robj pointers, and populates *argc with the number
* of items, by parsing the format specifier "fmt" as described for
* the RM_Call(), RM_Replicate() and other module APIs.
*
* The integer pointed by 'flags' is populated with flags according
* to special modifiers in "fmt". For now only one exists:
*
* "!" -> REDISMODULE_ARGV_REPLICATE
* "A" -> REDISMODULE_ARGV_NO_AOF
* "R" -> REDISMODULE_ARGV_NO_REPLICAS
*
* On error (format specifier error) NULL is returned and nothing is
* allocated. On success the argument vector is returned. */
robj **moduleCreateArgvFromUserFormat(const char *cmdname, const char *fmt, int *argcp, int *flags, va_list ap) {
int argc = 0, argv_size, j;
robj **argv = NULL;
/* As a first guess to avoid useless reallocations, size argv to
* hold one argument for each char specifier in 'fmt'. */
argv_size = strlen(fmt)+1; /* +1 because of the command name. */
argv = zrealloc(argv,sizeof(robj*)*argv_size);
/* Build the arguments vector based on the format specifier. */
argv[0] = createStringObject(cmdname,strlen(cmdname));
argc++;
as far as i can tell it segfaults somewhere around createStringObject (i don't have debug symbols in the core dump so it's all disassembler).
to reproduce just change back to the insiders image mentioned in https://github.com/sourcegraph/infrastructure/pull/2053
there's also a couple of uwe-dev-insiders-1,2,3,4 images with mods mentioned above. uwe-dev-insiders-4 has gdb in it