feat: config cold paths#15
Conversation
| const parsed_config = config.parseFromConfigFile(allocator, config_path) catch return; | ||
|
|
||
| stream_config.* = (parsed_config).value; |
There was a problem hiding this comment.
ParsedStreamConfig arena is leaked
parseFromConfigFile returns a ParsedStreamConfig that wraps a toml.Parsed(RawConfig) arena — all the string slices inside StreamConfig (ingest URLs, subscription names, bucket names, file paths, etc.) point into that arena. After stream_config.* = (parsed_config).value the arena goes out of scope and is never freed via parsed_config.deinit(), leaking all the TOML heap allocations for the lifetime of the process.
The ParsedStreamConfig needs to stay alive as long as stream_config is in use, and be deinitialized when the server shuts down. A minimal fix is to keep it alongside stream_config and call deinit in the caller:
// In main(), alongside stream_config:
var parsed_stream_config: ?config.ParsedStreamConfig = null;
defer if (parsed_stream_config) |*p| p.deinit();
var stream_config = StreamConfig{};
// inside loadConfig (or inline in main):
parsed_stream_config = try config.parseFromConfigFile(allocator, config_path);
stream_config = parsed_stream_config.?.value;| } | ||
| } | ||
|
|
||
| const parsed_config = config.parseFromConfigFile(allocator, config_path) catch return; |
There was a problem hiding this comment.
Config parse errors are silently swallowed
catch return discards every error from parseFromConfigFile — a missing required field, invalid TOML syntax, an unrecognised ingest method, etc. all cause the function to return silently, leaving stream_config untouched with its zero-value defaults. The user will see the server start up as if the config was applied when it wasn't.
At minimum, log the error before dropping it so operators can diagnose the problem:
| const parsed_config = config.parseFromConfigFile(allocator, config_path) catch return; | |
| const parsed_config = config.parseFromConfigFile(allocator, config_path) catch |err| { | |
| std.log.err("Failed to parse config file '{s}': {}", .{ config_path, err }); | |
| return; | |
| }; |
| if (std.ascii.eqlIgnoreCase(method, "none")) { | ||
| populate_list[i] = .None; | ||
| } else if (std.ascii.eqlIgnoreCase(method, "native")) { | ||
| const native_cfg = cfg.populate.native orelse return error.MissingPopulateCsvConfig; |
There was a problem hiding this comment.
Misleading error name for missing native populate config
When the populate.method list includes "native" but [populate.native] is absent, the code returns error.MissingPopulateCsvConfig — a CSV-specific name. This will confuse anyone debugging a config that has nothing to do with CSV. A dedicated error like error.MissingPopulateNativeConfig would make the root cause obvious.
| const native_cfg = cfg.populate.native orelse return error.MissingPopulateCsvConfig; | |
| const native_cfg = cfg.populate.native orelse return error.MissingPopulateNativeConfig; |
|
|
||
| const csv_cfg = cfg.populate.csv orelse return error.TestMissingPopulateCsv; | ||
| const tdms_cfg = cfg.populate.tdms orelse return error.TestMissingPopulateTdms; | ||
| const native_cfg = cfg.populate.native orelse return error.TestMissingPopulateTdms; |
There was a problem hiding this comment.
Wrong test error tag for missing native config
The error tag TestMissingPopulateTdms is reused here when it should be something like TestMissingPopulateNative. If this test ever fails on this line the diagnostic will point at TDMS, sending investigation in the wrong direction.
| const native_cfg = cfg.populate.native orelse return error.TestMissingPopulateTdms; | |
| const native_cfg = cfg.populate.native orelse return error.TestMissingPopulateNative; |
| const ts = std.fmt.parseInt(i64, record[2], 10) catch continue; | ||
| try values.append(self.allocator, Value{ .Int = ts }); | ||
| } else if (column_id == 1) { | ||
| const v = try parseValueFromStrings(self.allocator, record[3], record[4]); |
There was a problem hiding this comment.
Value.Bytes inner allocation is leaked on every CSV read
parseValueFromStrings calls parseHexBytes, which allocates a []u8 via the provided allocator and wraps it in Value{ .Bytes = bytes }. That slice is appended to the values list and eventually returned as a []Value to callers.
All callers (e.g. queryDisk in tsm.zig) free the outer []Value slice with allocator.free(values_entry), but that only frees the slice header — it does not recurse into Value.Bytes inner allocations. Every Bytes value read from a CSV entry therefore leaks its underlying byte slice.
The fix is to free the inner Bytes slice before freeing the container, or to avoid heap-allocating in parseHexBytes and instead encode bytes in a way that doesn't require a separate allocation (e.g. base64 decoded into a fixed-size buffer, or accept the limitation that Bytes is not supported in the CSV path). For example, in getColumnRangeById:
// After building `values`, before returning:
// Alternatively, define a helper freeValues() that iterates and frees .Bytes slices.
errdefer for (values.items) |v| {
if (v == .Bytes) self.allocator.free(v.Bytes);
};bd15ba3 to
481f4f1
Compare
c49cf65 to
f6eb9a8
Compare
0457fa4 to
e4ac438
Compare
closes: #14
Greptile Summary
This PR introduces runtime-selectable storage backends for the TSM tree (Native binary or CSV), driven by a new TOML-based configuration system.
src/config.zigparsesconfig.tomlinto aStreamConfig,TsmTreeRuntimewraps the two concrete tree types in a tagged union, andsrc/main.zigwires them together at startup. The benchmarks (billions.zig,millions.zig) are also migrated to the newTsmTreeRuntimeAPI.Key concerns found during review:
loadConfig(src/main.zig):ParsedStreamConfig.deinit()is never called, leaking the entire TOML arena (all config string slices — URLs, bucket names, file paths — are owned by this arena).src/main.zig):catch returninloadConfigdiscards every parse error without logging, leaving the server running silently on default config when the config file is invalid.Value.Bytesmemory leak in CSV entry (src/tsm/entry_csv.zig): Bytes values parsed from CSV allocate inner[]u8slices that are never freed when callers free the returned[]Value.src/config.zigline 170):error.MissingPopulateCsvConfigis returned when the native populate section is absent; should beerror.MissingPopulateNativeConfig.src/config.zigline 284):TestMissingPopulateTdmsused as the error tag for a missing native config assertion.Confidence Score: 2/5
loadConfig, silent error discard) combined with a memory leak in the new CSV entry path lower confidence significantly. The coreTsmTreeRuntimedispatch and config parsing logic are otherwise sound.src/main.zig(arena leak + silent errors) andsrc/tsm/entry_csv.zig(Bytes value leak) need the most attention before merging.Important Files Changed
MissingPopulateCsvConfigfor native populate path) and a copy-paste error in a test helper error tag (TestMissingPopulateTdmsused for missing native config).parseFromConfigFileis leaked (neverdeinited), and config parse errors are silently swallowed withcatch return, leaving the server running on default config without any log output.TsmTreeRuntimetagged union enabling runtime selection between Native and CSV backends. Clean implementation with properly forwarded method dispatch for all public API methods.Value.Bytesvalues parsed viaparseHexBytesallocate heap memory that is never freed when callers free the returned[]Valueslice. Also does full file scans on everygetColumnRangeByIdcall (no page-level indexing, expected for a cold path).natsandtomldependencies and wires them into the main executable module.Sequence Diagram
sequenceDiagram participant main participant loadConfig participant config.parseFromConfigFile participant toml.Parser participant TsmTreeRuntime main->>loadConfig: loadConfig(allocator, &stream_config) loadConfig->>loadConfig: parse --config CLI arg (default: ./config.toml) loadConfig->>config.parseFromConfigFile: parseFromConfigFile(allocator, path) config.parseFromConfigFile->>toml.Parser: parser.parseString(content) toml.Parser-->>config.parseFromConfigFile: toml.Parsed(RawConfig) [arena owns all strings] config.parseFromConfigFile->>config.parseFromConfigFile: map RawConfig → StreamConfig config.parseFromConfigFile-->>loadConfig: ParsedStreamConfig { parsed (arena), value (StreamConfig) } Note over loadConfig: ⚠️ parsed.deinit() never called → arena leaked loadConfig-->>main: stream_config updated (slices point into leaked arena) main->>main: select EntryBackend from stream_config.flush main->>TsmTreeRuntime: TsmTreeRuntime.init(allocator, "demo", backend) alt backend == .Native TsmTreeRuntime-->>main: TsmTreeRuntime{ .Native = TsmTreeNative } else backend == .CSV TsmTreeRuntime-->>main: TsmTreeRuntime{ .CSV = TsmTreeCsv } end main->>main: spawn handleWasmWebsocket, handleWasm, runServerLast reviewed commit: c559e16