Skip to content

feat: config cold paths#15

Draft
notxorand wants to merge 6 commits into
devfrom
feat/config-extensions
Draft

feat: config cold paths#15
notxorand wants to merge 6 commits into
devfrom
feat/config-extensions

Conversation

@notxorand

@notxorand notxorand commented Mar 10, 2026

Copy link
Copy Markdown
Member

closes: #14

Greptile Summary

This PR introduces runtime-selectable storage backends for the TSM tree (Native binary or CSV), driven by a new TOML-based configuration system. src/config.zig parses config.toml into a StreamConfig, TsmTreeRuntime wraps the two concrete tree types in a tagged union, and src/main.zig wires them together at startup. The benchmarks (billions.zig, millions.zig) are also migrated to the new TsmTreeRuntime API.

Key concerns found during review:

  • Memory leak in loadConfig (src/main.zig): ParsedStreamConfig.deinit() is never called, leaking the entire TOML arena (all config string slices — URLs, bucket names, file paths — are owned by this arena).
  • Silent config error swallowing (src/main.zig): catch return in loadConfig discards every parse error without logging, leaving the server running silently on default config when the config file is invalid.
  • Value.Bytes memory leak in CSV entry (src/tsm/entry_csv.zig): Bytes values parsed from CSV allocate inner []u8 slices that are never freed when callers free the returned []Value.
  • Misleading error name (src/config.zig line 170): error.MissingPopulateCsvConfig is returned when the native populate section is absent; should be error.MissingPopulateNativeConfig.
  • Copy-paste error in test (src/config.zig line 284): TestMissingPopulateTdms used as the error tag for a missing native config assertion.

Confidence Score: 2/5

  • Not safe to merge as-is — the TOML arena memory leak and silent config error swallowing are correctness issues that affect every startup.
  • Two logic-level bugs in the critical startup path (arena leak in loadConfig, silent error discard) combined with a memory leak in the new CSV entry path lower confidence significantly. The core TsmTreeRuntime dispatch and config parsing logic are otherwise sound.
  • src/main.zig (arena leak + silent errors) and src/tsm/entry_csv.zig (Bytes value leak) need the most attention before merging.

Important Files Changed

Filename Overview
src/config.zig New TOML-based config parser for StreamConfig. Contains a misleading error name (MissingPopulateCsvConfig for native populate path) and a copy-paste error in a test helper error tag (TestMissingPopulateTdms used for missing native config).
src/main.zig Wires config loading and runtime backend selection. Has two critical issues: the TOML arena returned by parseFromConfigFile is leaked (never deinited), and config parse errors are silently swallowed with catch return, leaving the server running on default config without any log output.
src/tsm/tsm.zig Adds TsmTreeRuntime tagged union enabling runtime selection between Native and CSV backends. Clean implementation with properly forwarded method dispatch for all public API methods.
src/tsm/entry_csv.zig New CSV-backed disk entry. Has a memory leak: Value.Bytes values parsed via parseHexBytes allocate heap memory that is never freed when callers free the returned []Value slice. Also does full file scans on every getColumnRangeById call (no page-level indexing, expected for a cold path).
config.toml New sample config file exercising all ingest methods (WebSocket, NATS, HTTP, MQTT), all sync backends, and all populate methods. Used directly by the config parser tests.
build.zig Adds nats and toml dependencies and wires them into the main executable module.

Sequence Diagram

sequenceDiagram
    participant main
    participant loadConfig
    participant config.parseFromConfigFile
    participant toml.Parser
    participant TsmTreeRuntime

    main->>loadConfig: loadConfig(allocator, &stream_config)
    loadConfig->>loadConfig: parse --config CLI arg (default: ./config.toml)
    loadConfig->>config.parseFromConfigFile: parseFromConfigFile(allocator, path)
    config.parseFromConfigFile->>toml.Parser: parser.parseString(content)
    toml.Parser-->>config.parseFromConfigFile: toml.Parsed(RawConfig) [arena owns all strings]
    config.parseFromConfigFile->>config.parseFromConfigFile: map RawConfig → StreamConfig
    config.parseFromConfigFile-->>loadConfig: ParsedStreamConfig { parsed (arena), value (StreamConfig) }
    Note over loadConfig: ⚠️ parsed.deinit() never called → arena leaked
    loadConfig-->>main: stream_config updated (slices point into leaked arena)

    main->>main: select EntryBackend from stream_config.flush
    main->>TsmTreeRuntime: TsmTreeRuntime.init(allocator, "demo", backend)
    alt backend == .Native
        TsmTreeRuntime-->>main: TsmTreeRuntime{ .Native = TsmTreeNative }
    else backend == .CSV
        TsmTreeRuntime-->>main: TsmTreeRuntime{ .CSV = TsmTreeCsv }
    end

    main->>main: spawn handleWasmWebsocket, handleWasm, runServer
Loading

Last reviewed commit: c559e16

Greptile also left 5 inline comments on this PR.

@codspeed-hq

codspeed-hq Bot commented Mar 10, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

✅ 1 untouched benchmark


Comparing feat/config-extensions (c559e16) with dev (9a183e2)1

Open in CodSpeed

Footnotes

  1. No successful run was found on dev (802bc06) during the generation of this report, so 9a183e2 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

Comment thread src/main.zig
Comment on lines +613 to +615
const parsed_config = config.parseFromConfigFile(allocator, config_path) catch return;

stream_config.* = (parsed_config).value;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ParsedStreamConfig arena is leaked

parseFromConfigFile returns a ParsedStreamConfig that wraps a toml.Parsed(RawConfig) arena — all the string slices inside StreamConfig (ingest URLs, subscription names, bucket names, file paths, etc.) point into that arena. After stream_config.* = (parsed_config).value the arena goes out of scope and is never freed via parsed_config.deinit(), leaking all the TOML heap allocations for the lifetime of the process.

The ParsedStreamConfig needs to stay alive as long as stream_config is in use, and be deinitialized when the server shuts down. A minimal fix is to keep it alongside stream_config and call deinit in the caller:

// In main(), alongside stream_config:
var parsed_stream_config: ?config.ParsedStreamConfig = null;
defer if (parsed_stream_config) |*p| p.deinit();

var stream_config = StreamConfig{};
// inside loadConfig (or inline in main):
parsed_stream_config = try config.parseFromConfigFile(allocator, config_path);
stream_config = parsed_stream_config.?.value;

Comment thread src/main.zig
}
}

const parsed_config = config.parseFromConfigFile(allocator, config_path) catch return;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Config parse errors are silently swallowed

catch return discards every error from parseFromConfigFile — a missing required field, invalid TOML syntax, an unrecognised ingest method, etc. all cause the function to return silently, leaving stream_config untouched with its zero-value defaults. The user will see the server start up as if the config was applied when it wasn't.

At minimum, log the error before dropping it so operators can diagnose the problem:

Suggested change
const parsed_config = config.parseFromConfigFile(allocator, config_path) catch return;
const parsed_config = config.parseFromConfigFile(allocator, config_path) catch |err| {
std.log.err("Failed to parse config file '{s}': {}", .{ config_path, err });
return;
};

Comment thread src/config.zig
if (std.ascii.eqlIgnoreCase(method, "none")) {
populate_list[i] = .None;
} else if (std.ascii.eqlIgnoreCase(method, "native")) {
const native_cfg = cfg.populate.native orelse return error.MissingPopulateCsvConfig;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misleading error name for missing native populate config

When the populate.method list includes "native" but [populate.native] is absent, the code returns error.MissingPopulateCsvConfig — a CSV-specific name. This will confuse anyone debugging a config that has nothing to do with CSV. A dedicated error like error.MissingPopulateNativeConfig would make the root cause obvious.

Suggested change
const native_cfg = cfg.populate.native orelse return error.MissingPopulateCsvConfig;
const native_cfg = cfg.populate.native orelse return error.MissingPopulateNativeConfig;

Comment thread src/config.zig

const csv_cfg = cfg.populate.csv orelse return error.TestMissingPopulateCsv;
const tdms_cfg = cfg.populate.tdms orelse return error.TestMissingPopulateTdms;
const native_cfg = cfg.populate.native orelse return error.TestMissingPopulateTdms;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong test error tag for missing native config

The error tag TestMissingPopulateTdms is reused here when it should be something like TestMissingPopulateNative. If this test ever fails on this line the diagnostic will point at TDMS, sending investigation in the wrong direction.

Suggested change
const native_cfg = cfg.populate.native orelse return error.TestMissingPopulateTdms;
const native_cfg = cfg.populate.native orelse return error.TestMissingPopulateNative;

Comment thread src/tsm/entry_csv.zig
const ts = std.fmt.parseInt(i64, record[2], 10) catch continue;
try values.append(self.allocator, Value{ .Int = ts });
} else if (column_id == 1) {
const v = try parseValueFromStrings(self.allocator, record[3], record[4]);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Value.Bytes inner allocation is leaked on every CSV read

parseValueFromStrings calls parseHexBytes, which allocates a []u8 via the provided allocator and wraps it in Value{ .Bytes = bytes }. That slice is appended to the values list and eventually returned as a []Value to callers.

All callers (e.g. queryDisk in tsm.zig) free the outer []Value slice with allocator.free(values_entry), but that only frees the slice header — it does not recurse into Value.Bytes inner allocations. Every Bytes value read from a CSV entry therefore leaks its underlying byte slice.

The fix is to free the inner Bytes slice before freeing the container, or to avoid heap-allocating in parseHexBytes and instead encode bytes in a way that doesn't require a separate allocation (e.g. base64 decoded into a fixed-size buffer, or accept the limitation that Bytes is not supported in the CSV path). For example, in getColumnRangeById:

// After building `values`, before returning:
// Alternatively, define a helper freeValues() that iterates and frees .Bytes slices.
errdefer for (values.items) |v| {
    if (v == .Bytes) self.allocator.free(v.Bytes);
};

@notxorand notxorand marked this pull request as draft March 28, 2026 20:08
@notxorand notxorand force-pushed the dev branch 10 times, most recently from bd15ba3 to 481f4f1 Compare April 24, 2026 21:57
@notxorand notxorand force-pushed the dev branch 3 times, most recently from c49cf65 to f6eb9a8 Compare April 28, 2026 01:31
@notxorand notxorand force-pushed the dev branch 2 times, most recently from 0457fa4 to e4ac438 Compare May 17, 2026 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: modularise cold paths

1 participant