feat(datalog): aggregates, float constants, between sugar, and typed head constants#7
Conversation
There was a problem hiding this comment.
Pull request overview
This PR extends the rayforce2 Datalog engine with aggregate body literals, float constants in expressions, between surface-syntax sugar, and typed head constants (including constant-slot broadcast projection across IDBs), plus associated integration and unit tests.
Changes:
- Add
DL_AGGbody literals (scalar + grouped) with stratification support and evaluation indl_compile_rule. - Add float constants to the assignment expression AST (
DL_EXPR_CONST_F64) with mixed i64/f64 promotion. - Extend surface syntax parsing to support aggregates,
between, typed head constants, and string constants in body positions; add extensive tests.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
test/test_datalog.c |
Adds many new Datalog feature tests, including surface-syntax runtime-driven cases. |
src/ops/datalog.h |
Extends public Datalog API/types for aggregates, float constants, and typed head constants. |
src/ops/datalog.c |
Implements aggregate compilation/eval, float expression support, typed constant projection, parser updates, and env-bound EDB auto-registration. |
docs/plans/2026-04-18-datalog-aggregates-and-onboarding.md |
Adds the implementation plan document referenced by the PR. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (!fresh || RAY_IS_ERR(fresh)) return; | ||
| for (int c = 0; c < rel->arity; c++) { | ||
| ray_t* empty_col = ray_vec_new(desired[c], 0); | ||
| if (!empty_col || RAY_IS_ERR(empty_col)) { ray_release(fresh); return; } | ||
| fresh = ray_table_add_col(fresh, rel->col_names[c], empty_col); | ||
| ray_release(empty_col); | ||
| if (RAY_IS_ERR(fresh)) return; |
There was a problem hiding this comment.
dl_idb_align_head_const_types rebuilds fresh and then returns early on RAY_IS_ERR(fresh) without releasing the error object / partially-built table. This can leak and also leaves the IDB schema in an indeterminate state for the remainder of compilation. Ensure that any error return releases fresh (and ideally leaves rel->table unchanged) before returning.
| if (!fresh || RAY_IS_ERR(fresh)) return; | |
| for (int c = 0; c < rel->arity; c++) { | |
| ray_t* empty_col = ray_vec_new(desired[c], 0); | |
| if (!empty_col || RAY_IS_ERR(empty_col)) { ray_release(fresh); return; } | |
| fresh = ray_table_add_col(fresh, rel->col_names[c], empty_col); | |
| ray_release(empty_col); | |
| if (RAY_IS_ERR(fresh)) return; | |
| if (!fresh) return; | |
| if (RAY_IS_ERR(fresh)) { ray_release(fresh); return; } | |
| for (int c = 0; c < rel->arity; c++) { | |
| ray_t* empty_col = ray_vec_new(desired[c], 0); | |
| if (!empty_col || RAY_IS_ERR(empty_col)) { ray_release(fresh); return; } | |
| fresh = ray_table_add_col(fresh, rel->col_names[c], empty_col); | |
| ray_release(empty_col); | |
| if (RAY_IS_ERR(fresh)) { ray_release(fresh); return; } |
| bool any_change = false; | ||
| int8_t desired[DL_MAX_ARITY]; | ||
| for (int c = 0; c < rel->arity; c++) { | ||
| ray_t* col = ray_table_get_col_idx(rel->table, c); | ||
| int8_t cur = col ? col->type : RAY_I64; | ||
| int8_t want = rule->head_const_types[c]; | ||
| if (want == 0) { | ||
| /* No constant hint for this slot — keep current type. */ | ||
| desired[c] = cur; | ||
| } else { | ||
| desired[c] = want; | ||
| if (want != cur) any_change = true; | ||
| } | ||
| } | ||
| if (!any_change) return; |
There was a problem hiding this comment.
dl_idb_align_head_const_types can rebuild the same IDB table multiple times while the relation is still empty, meaning the final column types depend on rule-add order. If two rules for the same head predicate specify conflicting head_const_types for a slot, evaluation will later fail when projecting/unioning mismatched column types. Consider tracking a per-relation “committed” column type per slot (first non-zero wins) and failing fast on conflicts instead of silently re-aligning.
| @@ -44,6 +52,18 @@ static void datalog_teardown(void* fixture) { | |||
| ray_heap_destroy(); | |||
| } | |||
|
|
|||
| /* Full runtime — required for ray_eval_str("(rule ...)") surface-syntax tests. */ | |||
| static void* datalog_rf_setup(const void* params, void* user_data) { | |||
| (void)params; (void)user_data; | |||
| ray_runtime_create(0, NULL); | |||
| return NULL; | |||
| } | |||
|
|
|||
| static void datalog_rf_teardown(void* fixture) { | |||
| (void)fixture; | |||
| ray_runtime_destroy(__RUNTIME); | |||
| } | |||
There was a problem hiding this comment.
The runtime API is re-declared manually here (forward-declared struct + externs). This is brittle if the signature or globals change, and the setup doesn’t check ray_runtime_create() for failure before dereferencing __RUNTIME in teardown. Prefer including the owning header (e.g. core/runtime.h) and asserting the returned runtime pointer is non-NULL, then pass that pointer to ray_runtime_destroy().
| case DL_EXPR_VAR: { | ||
| int ci = var_col[expr->var_idx]; | ||
| ray_t* src = ray_table_get_col_idx(accum, ci); | ||
| if (!src) return NULL; | ||
| ray_t* dst = ray_vec_new(RAY_I64, nrows); | ||
| int8_t t = (src->type == RAY_F64) ? RAY_F64 : RAY_I64; | ||
| size_t elem = (t == RAY_F64) ? sizeof(double) : sizeof(int64_t); | ||
| ray_t* dst = ray_vec_new(t, nrows); | ||
| if (!dst || RAY_IS_ERR(dst)) return NULL; | ||
| dst->len = nrows; | ||
| memcpy(ray_data(dst), ray_data(src), (size_t)nrows * sizeof(int64_t)); | ||
| memcpy(ray_data(dst), ray_data(src), (size_t)nrows * elem); | ||
| return dst; |
There was a problem hiding this comment.
In dl_eval_expr's DL_EXPR_VAR case, any non-RAY_F64 source column is treated as RAY_I64 and copied with a fixed 8-byte element size. If a variable ever binds to a non-numeric column (e.g., RAY_SYM or other types), this will reinterpret data and can overread/mis-copy when the underlying element size differs. Consider explicitly rejecting non-(RAY_I64|RAY_F64) sources (return an error) or copying with ray_sym_elem_size(src->type, src->attrs) and preserving the source type where appropriate.
| ray_op_t* keys_ops[DL_AGG_MAX_KEYS]; | ||
| for (int i = 0; i < nk; i++) { | ||
| int64_t sym = src_rel->col_names[body->agg_group_key_cols[i]]; | ||
| ray_t* s = ray_sym_str(sym); | ||
| keys_ops[i] = ray_scan(gg, ray_str_ptr(s)); | ||
| } |
There was a problem hiding this comment.
Grouped aggregate compilation indexes src_rel->col_names[body->agg_group_key_cols[i]] without validating that each group_key_col is within [0, src_rel->arity). Since key column indices come from rule builders / surface syntax, an out-of-range value can cause out-of-bounds reads and crashes. Add bounds checks (and return a structured error) before using agg_group_key_cols, and similarly validate agg_value_col for the grouped case.
The sym/mmap/atomic-swap landings (ray-exomem commits 467b, 21dfca, fc24a0) and today's JSONL cut depend on rayforce2 features that only exist on feature/datalog-aggregates: ray_runtime_create_with_sym, ray_read_splayed (mmap), and the head-const / auto-EDB projection the bootstrap rules use. The sibling ../rayforce2 path is unaffected — devs can still track that branch locally. Only the fallback clone (used by `cargo install` and CI) changes target branch. Flip back to master after RayforceDB/rayforce2#7 merges. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The sym/mmap/atomic-swap landings (ray-exomem commits 467b, 21dfca, fc24a0) and today's JSONL cut depend on rayforce2 features that only exist on feature/datalog-aggregates: ray_runtime_create_with_sym, ray_read_splayed (mmap), and the head-const / auto-EDB projection the bootstrap rules use. The sibling ../rayforce2 path is unaffected — devs can still track that branch locally. Only the fallback clone (used by `cargo install` and CI) changes target branch. Flip back to master after RayforceDB/rayforce2#7 merges. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ray_t* dst = ray_vec_new(src->type, nrows); | ||
| if (!dst || RAY_IS_ERR(dst)) continue; | ||
| dst->len = nrows; | ||
| /* Use element size from the source vec so SYM with any width, | ||
| * I64, and F64 all copy correctly. */ | ||
| uint8_t esz = ray_sym_elem_size(src->type, src->attrs); | ||
| if (esz == 0) { ray_release(dst); continue; } | ||
| memcpy(ray_data(dst), ray_data(src), (size_t)nrows * (size_t)esz); |
There was a problem hiding this comment.
dl_project allocates dst via ray_vec_new(src->type, nrows). For RAY_SYM columns this always creates a W64 vector, but the memcpy uses src->attrs element size (possibly 1/2/4). This under-copies and leaves garbage (and breaks sym reads) when src is a narrow SYM vec. Allocate dst with the same SYM width (e.g. ray_sym_vec_new(src->attrs & RAY_SYM_W_MASK, nrows) or the existing col_vec_new helper) or explicitly widen while writing via ray_read_sym/ray_write_sym.
| ray_graph_t* gg = ray_graph_new(src_table); | ||
| if (!gg) { ray_release(accum); return NULL; } | ||
|
|
||
| ray_op_t* keys_ops[DL_AGG_MAX_KEYS]; | ||
| for (int i = 0; i < nk; i++) { | ||
| int kc = body->agg_group_key_cols[i]; | ||
| if (kc < 0 || kc >= src_rel->arity) { ray_release(accum); return NULL; } | ||
| int64_t sym = src_rel->col_names[kc]; |
There was a problem hiding this comment.
Grouped-aggregate compilation leaks the ray_graph_t gg on the early error path when a group key column index is out of range (it returns after allocating gg). Ensure gg is freed before returning on these validation failures, and consider also checking ray_scan results for NULL/error before proceeding.
| if (sym_path) { | ||
| /* Pre-flight size check: reject files that would blow past the | ||
| * memory budget before ever touching ray_col_load. */ | ||
| struct stat st; | ||
| if (stat(sym_path, &st) == 0) { | ||
| /* Allow the sym file itself plus some working headroom (2x). | ||
| * A well-formed sym file is a list of interned strings; the | ||
| * in-memory footprint is bounded by file size within a small | ||
| * constant factor. */ | ||
| if (st.st_size > 0 && | ||
| (int64_t)st.st_size > rt->mem_budget / 2) { | ||
| if (out_sym_err) *out_sym_err = RAY_ERR_OOM; | ||
| /* Continue startup with empty sym table; caller decides | ||
| * whether to treat this as fatal. */ | ||
| } else { | ||
| ray_err_t sym_err = ray_sym_load(sym_path); | ||
| if (out_sym_err) *out_sym_err = sym_err; | ||
| /* RAY_ERR_CORRUPT and I/O errors are non-fatal here: | ||
| * caller inspects out_sym_err to decide recovery. */ | ||
| } | ||
| } | ||
| /* ENOENT and other stat failures: leave out_sym_err = RAY_OK; | ||
| * an absent sym file is the normal first-run case. */ | ||
| } |
There was a problem hiding this comment.
runtime_create_impl treats any stat() failure as “no sym file” (out_sym_err stays RAY_OK) and skips ray_sym_load entirely. That hides real I/O problems (e.g., EACCES) and defeats the whole "load sym before builtins" guarantee for persistent consumers. Consider distinguishing ENOENT from other errors and either (a) attempt ray_sym_load directly and rely on its error codes, or (b) set out_sym_err appropriately for non-ENOENT stat failures.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (src_nrows > 0) { | ||
| ray_t* val_col = | ||
| ray_table_get_col_idx(src_table, body->agg_value_col); | ||
| if (!val_col || val_col->type != RAY_I64) { | ||
| result = 0; | ||
| } else { |
There was a problem hiding this comment.
Scalar SUM/MIN/MAX/AVG aggregation only handles RAY_I64 value columns; if the source column is RAY_F64 (which is now more common with float constants / mixed arithmetic), the code silently returns 0 instead of a correct result or an explicit error. Consider supporting both RAY_I64 and RAY_F64 here (mirroring the grouped path via ray_group), or returning a type error when the value column isn't numeric.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (!src) continue; | ||
| ray_t* dst = ray_vec_new(src->type, nrows); | ||
| if (!dst || RAY_IS_ERR(dst)) continue; | ||
| dst->len = nrows; | ||
| /* Use element size from the source vec so SYM with any width, | ||
| * I64, and F64 all copy correctly. */ | ||
| uint8_t esz = ray_sym_elem_size(src->type, src->attrs); | ||
| if (esz == 0) { ray_release(dst); continue; } | ||
| memcpy(ray_data(dst), ray_data(src), (size_t)nrows * (size_t)esz); | ||
| out = ray_table_add_col(out, head_rel->col_names[c], dst); | ||
| ray_release(dst); | ||
| } else { | ||
| /* Constant head slot: materialize an owned broadcast column. */ | ||
| int8_t ctype = head_const_types ? head_const_types[c] : 0; | ||
| if (ctype == 0) continue; /* legacy/unset */ | ||
| ray_t* bcast = dl_broadcast_const_col(nrows, ctype, head_consts[c]); | ||
| if (!bcast || RAY_IS_ERR(bcast)) continue; | ||
| out = ray_table_add_col(out, head_rel->col_names[c], bcast); |
There was a problem hiding this comment.
dl_project() currently treats allocation failures by continue-ing, which can (a) leak the RAY_ERROR object returned from ray_vec_new()/dl_broadcast_const_col() and (b) produce an output table missing columns (silently corrupting the derived relation schema). Instead, release any error objects and propagate a failure (return an error table) so callers can abort rule evaluation cleanly.
| ray_op_t* keys_ops[DL_AGG_MAX_KEYS]; | ||
| for (int i = 0; i < nk; i++) { | ||
| int kc = body->agg_group_key_cols[i]; | ||
| if (kc < 0 || kc >= src_rel->arity) { ray_release(accum); return NULL; } |
There was a problem hiding this comment.
In the grouped-aggregate compile path, if a group key column index is out of range you early-return without freeing the sub-graph gg (created just above). This leaks the graph and any retained resources. Ensure gg is freed on all error paths after ray_graph_new(), including the out-of-range key-col case.
| if (kc < 0 || kc >= src_rel->arity) { ray_release(accum); return NULL; } | |
| if (kc < 0 || kc >= src_rel->arity) { | |
| ray_graph_free(gg); | |
| ray_release(accum); | |
| return NULL; | |
| } |
| int pred_arity = 1; | ||
| if (prog) { | ||
| int ri = dl_find_rel(prog, pred_name); | ||
| if (ri >= 0) pred_arity = prog->rels[ri].arity; | ||
| } |
There was a problem hiding this comment.
Aggregate parsing sets pred_arity to 1 when prog is NULL. That value is later used by ray_query_fn's env-bound EDB auto-registration (it checks ncols != pred_arity for DL_AGG too), so aggregates over env-bound relations (or over any non-arity-1 predicate defined via surface syntax before a program exists) can fail to auto-register and then fail at compile time. Consider storing an 'unknown' arity (e.g. 0/-1) for DL_AGG when prog is NULL and relaxing the env auto-register arity check in that case, or resolving arity at compile time instead of parse time.
…lars Three correctness fixes in src/ops/datalog.c flagged by the Copilot review on PR RayforceDB#7, each with a matching regression test: * dl_project (line ~1001): preserve SYM index width on the projected column. ray_vec_new(RAY_SYM, …) always returns a W64 vec, so memcpy'ing with the source's narrower element size left the upper bytes of each W64 slot uninitialized and produced bogus sym IDs. Route SYM columns through ray_sym_vec_new(src->attrs & RAY_SYM_W_MASK, …) so dst's stride matches src. * Grouped aggregate compile (line ~1269): bounds-check each group_key_col and agg_value_col against src_rel->arity before indexing col_names[]. An out-of-range key would OOB-read; an out-of-range value was silently clamped to column 0, producing correct-looking but wrong results. Also ensure ray_graph_free(gg) runs on every error return after gg is allocated (previously leaked on the out-of-range path). * Scalar SUM/MIN/MAX/AVG (line ~1347): accept RAY_F64 value columns. The prior code treated any non-RAY_I64 as "return 0", so float-valued sources silently produced wrong results. New behaviour: i64 in → i64 out, f64 in → f64 out (AVG still always emits f64); non-numeric sources are rejected rather than zero'd. Tests added: * /datalog/agg_scalar_f64 — sum/avg over an f64 value column * /datalog/agg_grouped_key_col_oor — OOR group-key col rejected cleanly * /datalog/project_narrow_sym — rule passes a W8 SYM column through make test (ASan+UBSan) and make release both green at 696/696.
Two follow-up fixes from the latest Copilot review round on PR RayforceDB#7: * dl_project (src/ops/datalog.c): allocation failures used to `continue` past the failed slot, leaking the RAY_ERROR object and producing a derived table with silently-missing columns. Each error path now releases any in-flight allocations and returns an error ray_t so the caller can abort rule eval cleanly. The existing success path is unchanged. * Aggregate parser + env auto-register: the parser hardcoded pred_arity=1 when `prog` was NULL (surface syntax, globals), so env-bound source tables with any other arity were silently skipped during query-time auto-register. Use 0 as an "unknown at parse time" sentinel and let the auto-register resolve arity from the env-bound table's column count. Compile already looks arity up via dl_find_rel, so no compile-side change is needed. Regression test /datalog/env_bound_agg_auto_register exercises (sum ?s salaries 1) over a 2-column env-bound table via the (rules …) clause; the pre-fix code would reject the auto-register on arity mismatch. 699/699 pass on debug (ASan+UBSan) and release.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (!fresh || RAY_IS_ERR(fresh)) return; | ||
| for (int c = 0; c < rel->arity; c++) { | ||
| ray_t* empty_col = ray_vec_new(desired[c], 0); | ||
| if (!empty_col || RAY_IS_ERR(empty_col)) { ray_release(fresh); return; } | ||
| ray_t* prev = fresh; | ||
| fresh = ray_table_add_col(fresh, rel->col_names[c], empty_col); | ||
| ray_release(empty_col); | ||
| if (RAY_IS_ERR(fresh)) { ray_release(prev); return; } |
There was a problem hiding this comment.
dl_idb_align_head_const_types(): if ray_table_add_col() returns a RAY_ERROR, the code releases prev but leaks the returned error object (fresh). Release the error before returning to avoid accumulating errors on repeated calls (same issue for a RAY_ERROR return from ray_table_new).
| if (!fresh || RAY_IS_ERR(fresh)) return; | |
| for (int c = 0; c < rel->arity; c++) { | |
| ray_t* empty_col = ray_vec_new(desired[c], 0); | |
| if (!empty_col || RAY_IS_ERR(empty_col)) { ray_release(fresh); return; } | |
| ray_t* prev = fresh; | |
| fresh = ray_table_add_col(fresh, rel->col_names[c], empty_col); | |
| ray_release(empty_col); | |
| if (RAY_IS_ERR(fresh)) { ray_release(prev); return; } | |
| if (!fresh) return; | |
| if (RAY_IS_ERR(fresh)) { ray_release(fresh); return; } | |
| for (int c = 0; c < rel->arity; c++) { | |
| ray_t* empty_col = ray_vec_new(desired[c], 0); | |
| if (!empty_col || RAY_IS_ERR(empty_col)) { ray_release(fresh); return; } | |
| ray_t* prev = fresh; | |
| fresh = ray_table_add_col(fresh, rel->col_names[c], empty_col); | |
| ray_release(empty_col); | |
| if (RAY_IS_ERR(fresh)) { ray_release(prev); ray_release(fresh); return; } |
| ray_err_t* out_sym_err) { | ||
| return runtime_create_impl(sym_path, out_sym_err); | ||
| } | ||
|
|
There was a problem hiding this comment.
The new public entrypoints ray_runtime_create_with_sym() / ray_runtime_create_with_sym_err() are defined here, but they aren’t declared in src/core/runtime.h or the public include/rayforce.h (tests work around this via manual externs). Please add the prototypes to the appropriate headers so consumers can use the APIs without duplicating declarations, and so signature changes can’t silently drift.
| /* NOTE: These public entrypoints must also be declared in src/core/runtime.h | |
| * and include/rayforce.h to keep the API surface synchronized. | |
| */ |
| for (int c = 0; c < pred_arity; c++) { | ||
| ray_t* col = ray_table_get_col_idx(env_val, c); | ||
| ray_t* next_clean; | ||
| if (!col) continue; |
There was a problem hiding this comment.
ray_query_fn env auto-registration: if ray_table_get_col_idx(env_val, c) returns NULL, the loop currently continues, which can build a clean table with fewer than pred_arity columns but still registers it via dl_add_edb(prog, pred_name, clean, pred_arity). That can leave the program in a schema-inconsistent state. Treat a missing column as a hard error (or at least abort registration for that predicate) so only fully-formed tables are registered.
| if (!col) continue; | |
| if (!col) { | |
| ray_release(clean); | |
| dl_program_free(prog); | |
| ray_release(db); | |
| return ray_error("invalid", "query: env-backed EDB table missing expected column"); | |
| } |
| /* Missing/error inputs: return an owned reference to the other side | ||
| * (retained so callers can release uniformly). */ | ||
| if (!a || RAY_IS_ERR(a)) { | ||
| if (b && !RAY_IS_ERR(b)) ray_retain(b); |
There was a problem hiding this comment.
In table_union(), when a is NULL or an error, the function returns b but only retains it if b is not an error. Callers typically release their original b after union; if b is a RAY_ERROR this can become a use-after-free (merged points to freed memory). Fix by retaining b whenever returning it (even if it is an error), or by documenting/adjusting ownership so callers don’t release the input they pass through unchanged.
| if (b && !RAY_IS_ERR(b)) ray_retain(b); | |
| if (b) ray_retain(b); |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| * of src. If target==RAY_F64 and src is RAY_I64, promote. Returns new owned column. */ | ||
| static ray_t* dl_col_as_f64(ray_t* src, int64_t nrows) { | ||
| ray_t* out = ray_vec_new(RAY_F64, nrows); | ||
| if (!out || RAY_IS_ERR(out)) return NULL; |
There was a problem hiding this comment.
dl_col_as_f64() (and other helpers in this area) returns NULL when ray_vec_new() returns a RAY_ERROR. Because ray_release() is a no-op for RAY_ERROR, this leaks the error object. Prefer returning the RAY_ERROR to the caller (so dl_compile_rule/dl_eval can surface/free it) or calling ray_error_free(out) before returning NULL.
| if (!out || RAY_IS_ERR(out)) return NULL; | |
| if (!out) return NULL; | |
| if (RAY_IS_ERR(out)) { | |
| ray_error_free(out); | |
| return NULL; | |
| } |
| ray_t* dst = (src->type == RAY_SYM) | ||
| ? ray_sym_vec_new(src->attrs & RAY_SYM_W_MASK, nrows) | ||
| : ray_vec_new(src->type, nrows); | ||
| if (!dst) { | ||
| ray_release(out); | ||
| return ray_error("memory", "dl_project: vec_new"); | ||
| } | ||
| if (RAY_IS_ERR(dst)) { | ||
| ray_error_free(dst); | ||
| ray_release(out); | ||
| return ray_error("memory", "dl_project: vec_new"); | ||
| } | ||
| dst->len = nrows; | ||
| uint8_t esz = ray_sym_elem_size(src->type, src->attrs); | ||
| if (esz == 0) { | ||
| ray_release(dst); | ||
| ray_release(out); | ||
| return ray_error("type", "dl_project: unsupported column type"); | ||
| } | ||
| memcpy(ray_data(dst), ray_data(src), (size_t)nrows * (size_t)esz); | ||
| ray_t* next = ray_table_add_col(out, head_rel->col_names[c], dst); |
There was a problem hiding this comment.
dl_project() memcpy-copies RAY_STR columns into a freshly allocated vector but doesn't propagate the source column's str_pool. For pooled strings (>12 bytes), dst elements' pool_off will point into a NULL/empty pool, causing invalid reads later. After allocating dst for RAY_STR, propagate the pool (e.g., col_propagate_str_pool(dst, src)) or use a string-aware copy path like other ops do.
| if (sym_path) { | ||
| /* Pre-flight size check: reject files that would blow past the | ||
| * memory budget before ever touching ray_col_load. */ | ||
| struct stat st; | ||
| if (stat(sym_path, &st) == 0) { | ||
| /* Allow the sym file itself plus some working headroom (2x). | ||
| * A well-formed sym file is a list of interned strings; the | ||
| * in-memory footprint is bounded by file size within a small | ||
| * constant factor. */ | ||
| if (st.st_size > 0 && | ||
| (int64_t)st.st_size > rt->mem_budget / 2) { | ||
| if (out_sym_err) *out_sym_err = RAY_ERR_OOM; | ||
| /* Continue startup with empty sym table; caller decides | ||
| * whether to treat this as fatal. */ | ||
| } else { | ||
| ray_err_t sym_err = ray_sym_load(sym_path); | ||
| if (out_sym_err) *out_sym_err = sym_err; | ||
| /* RAY_ERR_CORRUPT and I/O errors are non-fatal here: | ||
| * caller inspects out_sym_err to decide recovery. */ | ||
| } | ||
| } | ||
| /* ENOENT and other stat failures: leave out_sym_err = RAY_OK; | ||
| * an absent sym file is the normal first-run case. */ | ||
| } |
There was a problem hiding this comment.
runtime_create_impl(): when stat(sym_path, …) fails, the code currently leaves out_sym_err = RAY_OK and skips ray_sym_load entirely. That hides real I/O errors (EACCES, ENOTDIR, etc.) and contradicts the header comment that the _err variant surfaces I/O failures. Consider: (1) treat errno==ENOENT as OK/first-run, but for other errno set *out_sym_err = RAY_ERR_IO; and/or (2) call ray_sym_load regardless and only use stat for the size pre-flight when it succeeds.
| /* Persistent-consumer lifecycle: load the sym table from `sym_path` (if | ||
| * present) before builtins register, so user-interned IDs keep the same | ||
| * slots across process restarts. The _err variant surfaces the load | ||
| * result via `out_sym_err` (RAY_OK / RAY_ERR_CORRUPT / I/O errors) so | ||
| * callers can decide recovery policy; the plain variant discards it. */ | ||
| ray_runtime_t* ray_runtime_create_with_sym(const char* sym_path); | ||
| ray_runtime_t* ray_runtime_create_with_sym_err(const char* sym_path, | ||
| ray_err_t* out_sym_err); |
There was a problem hiding this comment.
New public APIs ray_runtime_create_with_sym / ray_runtime_create_with_sym_err change init ordering and error semantics, but there are still no tests exercising the key cases (load-before-builtins stability, corrupt file -> RAY_ERR_CORRUPT, oversized file -> RAY_ERR_OOM, and non-ENOENT I/O failure -> RAY_ERR_IO). Adding targeted runtime/sym integration tests would help prevent regressions in persistent-consumer scenarios.
|
@copilot resolve the merge conflicts in this pull request |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.
Comments suppressed due to low confidence (1)
src/core/runtime.c:193
- runtime_create_impl initializes global subsystems (ray_heap_init/ray_sym_init) and then has multiple allocation failure exits that return NULL without tearing those subsystems down or resetting globals. This can leave the process in a partially-initialized state and make a subsequent runtime_create call behave unpredictably. Add a single cleanup path that calls ray_sym_destroy/ray_heap_destroy and clears __VM/__RUNTIME on failure.
/* Init subsystems */
ray_heap_init();
ray_sym_init();
/* Allocate runtime and set __VM + mem_budget BEFORE any file I/O so
* that ray_error() has a live VM to record diagnostics against and
* allocations are bounded by the budget. */
ray_runtime_t* rt = (ray_runtime_t*)ray_sys_alloc(sizeof(ray_runtime_t));
if (!rt) return NULL;
memset(rt, 0, sizeof(*rt));
/* Create main VM (id=0) */
rt->n_vms = 1;
rt->vms = (ray_vm_t**)ray_sys_alloc(sizeof(ray_vm_t*));
if (!rt->vms) { ray_sys_free(rt); return NULL; }
rt->vms[0] = (ray_vm_t*)ray_sys_alloc(sizeof(ray_vm_t));
if (!rt->vms[0]) { ray_sys_free(rt->vms); ray_sys_free(rt); return NULL; }
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ray_err_t err = RAY_OK; | ||
| ray_runtime_t* rt = ray_runtime_create_with_sym_err(path, &err); | ||
| munit_assert_ptr_not_null(rt); | ||
| munit_assert_int((int)err, !=, (int)RAY_OK); |
There was a problem hiding this comment.
test_create_with_sym_io_error_surfaces currently asserts only err != RAY_OK. Since runtime_create_impl explicitly maps non-ENOENT stat failures to RAY_ERR_IO, this test should assert the specific expected code (RAY_ERR_IO) so it will catch accidental changes in the error mapping.
| munit_assert_int((int)err, !=, (int)RAY_OK); | |
| munit_assert_int((int)err, ==, (int)RAY_ERR_IO); |
| /* Absent sym file: stat fails with ENOENT, which is the "first run" | ||
| * normal case. out_sym_err must stay RAY_OK and runtime must come up. */ | ||
| static MunitResult test_create_with_sym_absent_is_ok(const void* params, void* fixture) { | ||
| (void)params; (void)fixture; | ||
| char* dir = make_tmpdir(); | ||
| munit_assert_ptr_not_null(dir); | ||
| char path[256]; | ||
| snprintf(path, sizeof(path), "%s/missing.sym", dir); | ||
|
|
||
| ray_err_t err = RAY_ERR_OOM; /* poison — should be overwritten */ | ||
| ray_runtime_t* rt = ray_runtime_create_with_sym_err(path, &err); | ||
| munit_assert_ptr_not_null(rt); | ||
| munit_assert_int((int)err, ==, (int)RAY_OK); | ||
|
|
||
| ray_runtime_destroy(rt); | ||
| rmdir(dir); | ||
| free(dir); | ||
| return MUNIT_OK; | ||
| } |
There was a problem hiding this comment.
The runtime-with-sym tests cover ENOENT and a non-ENOENT stat failure, but they still don't exercise several key behaviors called out in the PR description / prior review feedback: (1) corrupt sym file surfaces RAY_ERR_CORRUPT via the _err variant, (2) oversized sym file triggers the pre-flight RAY_ERR_OOM path, and (3) load-before-builtins ordering (user IDs preserved, builtins appended). Adding targeted tests for these cases would prevent regressions in the persistence contract.
| if (empty_accum && head_rel && head_rel->table) { | ||
| ray_t* hcol = ray_table_get_col_idx(head_rel->table, c); | ||
| int8_t htype = hcol ? hcol->type : RAY_I64; | ||
| ray_t* ecol = ray_vec_new(htype, 0); | ||
| if (!ecol) { | ||
| ray_release(out); | ||
| return ray_error("memory", "dl_project: empty col"); | ||
| } | ||
| if (RAY_IS_ERR(ecol)) { | ||
| ray_error_free(ecol); | ||
| ray_release(out); | ||
| return ray_error("memory", "dl_project: empty col"); | ||
| } | ||
| ray_t* next = ray_table_add_col(out, head_rel->col_names[c], ecol); | ||
| ray_release(ecol); |
There was a problem hiding this comment.
In dl_project's empty-accum fallback, when the head relation column type is RAY_SYM it creates an empty column via ray_vec_new(RAY_SYM, 0), which will default to W64 and can mismatch a head_rel SYM column that uses a narrower width. Since this path exists specifically to keep schemas compatible for downstream unions, it should preserve SYM attrs/width (e.g., by mirroring hcol->attrs via ray_sym_vec_new) rather than forcing W64.
| if (node->type == -RAY_STR) { | ||
| /* Quoted string literal in body: intern as sym so it compares | ||
| * equal to other sym-interned constants. Mirrors the head | ||
| * parser convention. */ | ||
| int64_t sym = ray_sym_intern(ray_str_ptr(node), ray_str_len(node)); | ||
| dl_body_set_const(rule, bidx, pos, sym); | ||
| return NULL; | ||
| } |
There was a problem hiding this comment.
dl_set_body_pos now interns quoted string literals into SYM constants, which will cause the constant-filter path in dl_compile_rule (dl_filter_eq) to run against RAY_SYM columns. dl_filter_eq currently treats the filter column as int64_t* and compares raw 8-byte values, which is incorrect for adaptive-width SYM vectors (W8/W16/W32) and can miscompare/overread. The filter implementation should become type-aware (use ray_read_sym for RAY_SYM and element-size-aware copying for non-I64 types).
| if (node->type == -RAY_STR) { | |
| /* Quoted string literal in body: intern as sym so it compares | |
| * equal to other sym-interned constants. Mirrors the head | |
| * parser convention. */ | |
| int64_t sym = ray_sym_intern(ray_str_ptr(node), ray_str_len(node)); | |
| dl_body_set_const(rule, bidx, pos, sym); | |
| return NULL; | |
| } | |
| /* Do not eagerly intern quoted string literals here. | |
| * | |
| * Interning body strings into SYM constants routes them into the | |
| * constant-filter path, which must be type-aware for adaptive-width | |
| * SYM vectors. Until that filter path is updated, fall through to the | |
| * generic evaluator instead of manufacturing a SYM constant here. | |
| */ |
| if (!new_col || RAY_IS_ERR(new_col)) | ||
| break; |
There was a problem hiding this comment.
In the scalar aggregate path, allocation failure for new_col is silently ignored (break), and if ray_vec_new returns a RAY_ERROR it is not freed. This can both leak the error object and allow evaluation to proceed with an unbound aggregate target var. Treat this as a hard failure: free any RAY_ERROR returned from ray_vec_new, set prog->eval_err, and return NULL so dl_eval returns -1.
| if (!new_col || RAY_IS_ERR(new_col)) | |
| break; | |
| if (!new_col) { | |
| ray_release(accum); | |
| prog->eval_err = true; | |
| return NULL; | |
| } | |
| if (RAY_IS_ERR(new_col)) { | |
| ray_release(new_col); | |
| ray_release(accum); | |
| prog->eval_err = true; | |
| return NULL; | |
| } |
…check Three hardening changes to ray_runtime_create_with_sym: 1. Load order: set __VM and mem_budget BEFORE ray_sym_load so file I/O errors surface via ray_error() (previously silently dropped because ray_error() checks __VM && fmt and __VM was NULL) and allocations during load are bounded by mem_budget. 2. Error surfacing: new ray_runtime_create_with_sym_err(path, &out_err) returns the sym_load result so callers can distinguish "file absent" (RAY_OK), "corrupt/incompatible" (RAY_ERR_CORRUPT), and "I/O failure" (RAY_ERR_IO) and decide recovery policy. 3. Pre-flight size check: stat() the sym file before ray_col_load and reject if it exceeds mem_budget / 2, preventing a malicious or corrupted file from OOMing the process before the allocator's budget guards engage. The original ray_runtime_create_with_sym wraps the _err variant with NULL out-param for callers that don't need the diagnostic. 693/693 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…e index dl_add_edb returns the newly-allocated relation index on success (0, 1, 2, …) or -1 on failure. The auto-register branch in ray_query_fn compared the return to != 0, so every env-bound EDB beyond the primary EAV (which always occupies idx 0) was misclassified as a failure and the whole query aborted with "query: failed to register env-backed EDB table". The new test_env_bound_edb_auto_register test, which is the first coverage for this feature, triggers the bug on every run (all four CI jobs were red on it). Switching to < 0 matches the return contract; 693/693 tests pass under both debug (ASan+UBSan) and release builds.
…lars Three correctness fixes in src/ops/datalog.c flagged by the Copilot review on PR RayforceDB#7, each with a matching regression test: * dl_project (line ~1001): preserve SYM index width on the projected column. ray_vec_new(RAY_SYM, …) always returns a W64 vec, so memcpy'ing with the source's narrower element size left the upper bytes of each W64 slot uninitialized and produced bogus sym IDs. Route SYM columns through ray_sym_vec_new(src->attrs & RAY_SYM_W_MASK, …) so dst's stride matches src. * Grouped aggregate compile (line ~1269): bounds-check each group_key_col and agg_value_col against src_rel->arity before indexing col_names[]. An out-of-range key would OOB-read; an out-of-range value was silently clamped to column 0, producing correct-looking but wrong results. Also ensure ray_graph_free(gg) runs on every error return after gg is allocated (previously leaked on the out-of-range path). * Scalar SUM/MIN/MAX/AVG (line ~1347): accept RAY_F64 value columns. The prior code treated any non-RAY_I64 as "return 0", so float-valued sources silently produced wrong results. New behaviour: i64 in → i64 out, f64 in → f64 out (AVG still always emits f64); non-numeric sources are rejected rather than zero'd. Tests added: * /datalog/agg_scalar_f64 — sum/avg over an f64 value column * /datalog/agg_grouped_key_col_oor — OOR group-key col rejected cleanly * /datalog/project_narrow_sym — rule passes a W8 SYM column through make test (ASan+UBSan) and make release both green at 696/696.
Follow-up to the scalar-aggregate F64 support commit: `is_float` was only flipped inside the `src_nrows > 0` branch, so an empty-source SUM over a RAY_F64 column skipped the promotion and allocated a RAY_I64 result column holding 0 instead of a RAY_F64 column holding 0.0. Inspect the value column's type once up front, before the row-count split, so the empty identity is emitted in the correct type. Also reject non-numeric source columns at the same point so an empty, unsupported-type source can't silently fall through to the i64 path either. Regression test /datalog/agg_scalar_f64_sum_empty: SUM over an empty RAY_F64 column must produce a 1-row RAY_F64 result with value 0.0. 697/697 pass on both debug (ASan+UBSan) and release.
…ount split The grouped path bounds-checked agg_value_col, but the scalar path's check lived inside the `src_nrows > 0` branch — so an empty source plus an out-of-range value col silently emitted the SUM identity (0 / 0.0) against an invalid slot instead of rejecting the rule. Move the value-column bounds check (for SUM/MIN/MAX/AVG) above the empty- source early return so it fires regardless of row count. COUNT still skips this check since it doesn't read any value column. Regression test /datalog/agg_scalar_value_col_oor_empty: empty i64 source + value_col=42 against an arity-1 relation must yield 0 rows, not 1 row holding SUM=0. 698/698 pass on debug and release.
Two follow-up fixes from the latest Copilot review round on PR RayforceDB#7: * dl_project (src/ops/datalog.c): allocation failures used to `continue` past the failed slot, leaking the RAY_ERROR object and producing a derived table with silently-missing columns. Each error path now releases any in-flight allocations and returns an error ray_t so the caller can abort rule eval cleanly. The existing success path is unchanged. * Aggregate parser + env auto-register: the parser hardcoded pred_arity=1 when `prog` was NULL (surface syntax, globals), so env-bound source tables with any other arity were silently skipped during query-time auto-register. Use 0 as an "unknown at parse time" sentinel and let the auto-register resolve arity from the env-bound table's column count. Compile already looks arity up via dl_find_rel, so no compile-side change is needed. Regression test /datalog/env_bound_agg_auto_register exercises (sum ?s salaries 1) over a 2-column env-bound table via the (rules …) clause; the pre-fix code would reject the auto-register on arity mismatch. 699/699 pass on debug (ASan+UBSan) and release.
…allowing them dl_eval previously returned 0 on every call, so any rule that failed to compile (dl_project OOM, aggregate bounds rejection, non-numeric value column) or produced a ray_execute error was silently dropped from the fixpoint — callers like ray_query_fn saw a 0 return and shipped an incomplete IDB as a successful query result. Introduce a program-scoped `eval_err` flag and flip it at every failure site inside dl_compile_rule and dl_eval's runtime pipeline: * dl_project returning a ray_error* from dl_compile_rule (OOM, type errors, add_col errors) — the error object is now released and the flag is set. * Grouped-agg group_key_col / value_col out-of-range checks. * Scalar-agg value_col out-of-range and non-numeric value column rejects. * ray_execute / table_rename_cols / table_union / table_distinct errors inside both Phase A (initial eval) and Phase B (semi-naive iteration). dl_eval now returns -1 iff the flag is set. NULL returns that represent legitimate "rule produced no rows" (empty accum after body atoms, etc.) still yield a 0 return — they're not errors. Also taught dl_project to fall back to the IDB's existing column types when accum collapses to 0 rows but has lost its schema (e.g. antijoin on a fully-filtered relation). This preserves schema agreement with the IDB for downstream table_union without re-introducing the silent missing-column bug. Tests: * /datalog/eval_surfaces_compile_failure (new): (query …) over a rule whose (sum ?s broken 99) is out of range must return a RAY_ERR, not an empty table. * /datalog/agg_scalar_value_col_oor_empty, /datalog/agg_grouped_key_col_oor updated to expect dl_eval == -1 (previously asserted 0, which itself was the silent-failure bug). 700/700 pass on debug (ASan+UBSan) and release.
…ecute Phase B of the semi-naive loop still had several `continue`-without-flag silent drops that bypassed the new eval_err propagation: * ray_graph_new() NULL in Phase A and Phase B (OOM) — now sets eval_err. * table_union() accumulator in the new-tuples-per-head aggregation — NULL or RAY_ERR returns used to overwrite new_tuples_per_rel[head_idx] or silently continue; now mark eval_err and clear the slot cleanly. * table_distinct / table_antijoin runtime errors in the per-IDB merge pass — now mark eval_err before continuing. * table_union of delta back into rel->table — a failure here desyncs the fixpoint (delta_tables was already stored) yet the original code swallowed it; now mark eval_err on failure. Phase A's `ray_graph_new` gained the same treatment for symmetry. NULL returns from dl_compile_rule still pass through untouched — the compiler itself sets eval_err on genuine failures; a bare NULL is "no rows this iteration" and must not fault the whole program. No new test is needed: /datalog/eval_surfaces_compile_failure already exercises the full query -> dl_eval -> error-surfacing chain, and the existing recursive tests (/lang/datalog/fixpoint, etc.) guard the Phase B happy path. 700/700 pass on debug (ASan+UBSan) and release.
…rtial tables table_union used to silently skip columns whose concat failed, producing a table with missing columns that callers mistook for a successful union — the caller then stored it as the IDB's new state or as merge output, so the partial result became persistent. Rewrite each helper to surface failure: * table_union: RAY_IS_ERR on any missing column, concat failure, or ray_table_add_col failure — release partial `out` and return a typed ray_error. Also retain `a`/`b` when we return one of them as-is so callers can release uniformly (previous code double-released when one input was NULL/error). * table_distinct: canonicalize/graph_new failures now return ray_error instead of falling back to the raw input table (which may not be distinct at all — a silent wrong-result). * table_antijoin: same treatment — canonicalize/graph_new failures return ray_error instead of treating `left` as the antijoin result. dl_eval already checks RAY_IS_ERR on these helpers' returns and raises eval_err, so the new error objects flow into the "evaluation failed" surface exposed through ray_query_fn. 700/700 tests pass on debug (ASan+UBSan) and release.
Even after the previous pass, table_union still silently narrowed to the common prefix when the two inputs had different column counts: it computed ncols = min(ncols_a, ncols_b) and produced a union table with fewer columns than either side. Callers (dl_eval merge paths) then stored that partial table as the IDB's new state — a silent schema-corruption bug. Reject the mismatch with a typed "schema" error instead of narrowing. The caller already picks this up via RAY_IS_ERR and sets prog->eval_err, so the failure surfaces to ray_query_fn. 700/700 tests pass on debug (ASan+UBSan) and release.
The schema-mismatch check only fired when both inputs had rows — the empty-side short-circuits (`nrows(a) == 0` returns b, `nrows(b) == 0` returns a) still bypassed it. An antijoin that collapsed to (0 rows, 0 cols) then entered the union from the left side and silently emitted `b` (wider schema) as the result; the caller stored it as the IDB's state and the arity mismatch surfaced only much later. Move the ncols_a != ncols_b check above the empty-rows bypass so every input path exits with either a schema-matching result or a ray_error. dl_eval already promotes the error to prog->eval_err. 700/700 tests pass on debug (ASan+UBSan) and release.
Four comments from the 2026-04-23 Copilot review round:
* `dl_idb_align_head_const_types` (src/ops/datalog.c:~204): split every
`if (!x || RAY_IS_ERR(x))` error branch so the RAY_ERROR object is
released before return. Repeated aligns were leaking an error block per
failed call. The `ray_table_add_col` failure path also now releases
both `prev` and the returned error, instead of just `prev`.
* `ray_runtime_create_with_sym` / `_with_sym_err` (src/core/runtime.h):
the two persistent-consumer entrypoints were defined in runtime.c but
never declared — tests duplicated the prototype by hand. Publish them
in src/core/runtime.h (alongside the existing `ray_runtime_create` /
`ray_runtime_destroy`) so consumers don't drift. Kept a minimal
forward-decl in test_datalog.c because including core/runtime.h from a
TU that also includes lang/eval.h collides on `ray_vm_t` (pre-existing
duplication out of scope for this PR) — the comment flags that.
* env auto-register (src/ops/datalog.c:~3620): a NULL from
`ray_table_get_col_idx(env_val, c)` used to `continue`, producing a
`clean` table with fewer than pred_arity columns that `dl_add_edb`
still registered as the EDB. Replaced the skip with a hard
ray_error("schema", "env-backed EDB table missing expected column")
so only fully-formed tables ever register.
* `table_union` pass-through (src/ops/datalog.c:~1819): the `a` is
NULL/error branch previously only retained `b` when it was non-error.
If callers passed a RAY_ERROR `b` and later released it, the returned
pointer aliased freed memory. Retain `b` whenever it's non-NULL —
even on error — so the pass-through contract is uniform.
Also adopted the setup/teardown cleanup from the review: datalog_rf_setup
now returns the created runtime as the fixture, teardown receives it
back, and failure to create aborts loudly instead of silently skipping
tests.
700/700 pass on debug (ASan+UBSan) and release.
The prior round's error-lifetime fixes were no-ops. ray_release(),
ray_retain(), and even ray_free() each short-circuit on RAY_IS_ERR, so
every `if (RAY_IS_ERR(x)) ray_release(x);` site quietly leaked the error
block — repeated failures (e.g. an aggregate evaluated in a tight REPL
loop) would slowly bleed memory until heap teardown.
Introduce ray_error_free(ray_t*) as the escape hatch:
void ray_error_free(ray_t* err) {
if (!err || !RAY_IS_ERR(err)) return;
err->type = -RAY_I64; /* retype to a leaf atom so ray_free and
ray_release_owned_refs don't skip it */
ray_free(err);
}
Retyping to an atom with no owned children is the safest way to route
an error block through the standard free path without requiring a new
allocator primitive. The helper is published in include/rayforce.h
alongside ray_error so consumers can use it directly.
Wire the helper through every site where a RAY_ERROR object was being
"released" but in fact leaking:
* dl_idb_align_head_const_types: ray_table_new / ray_vec_new /
ray_table_add_col error returns.
* dl_project: ecol, dst, add_col error returns (split the combined
`!x || RAY_IS_ERR(x)` branches so the NULL vs ERROR cases can take
different cleanup paths).
* dl_compile_rule dl_project propagation.
* dl_eval Phase A merge pipeline: raw_tuples, new_tuples, merged,
deduped.
* dl_eval Phase B semi-naive: raw_result, result, u (union accum),
new_tuples, deduped, delta, merged.
* ray_query_fn env auto-register: i64col, next_clean.
Regression test /datalog/error_free_reclaims loops 256 create/free
cycles and asserts via ray_mem_stats() that free_count advances
alongside alloc_count and bytes_allocated stays flat. Pre-fix, this
test flagged the leak (free_count stayed at 0 for the 256 errors).
701/701 pass on debug (ASan+UBSan) and release.
Four comments from the 2026-04-23T11:13 Copilot pass: * dl_col_as_f64 / dl_eval_expr helpers (src/ops/datalog.c): five more ray_vec_new sites used `if (!out || RAY_IS_ERR(out)) return NULL` which leaks the RAY_ERROR block (ray_release is a no-op on errors). Split each check and route the RAY_IS_ERR branch through ray_error_free() so the block is actually reclaimed. * dl_project RAY_STR pool propagation (src/ops/datalog.c:~1072): the memcpy path copied 16-byte ray_str_t handles but left dst->str_pool NULL, so reads of strings >12 bytes would dereference pool_off into a NULL pool. Call col_propagate_str_pool(dst, src) after the memcpy when src is RAY_STR (matches the pattern used by ops/filter and ops/pivot). * runtime_create_impl stat() errno handling (src/core/runtime.c:~246): previously any stat() failure silently stayed RAY_OK. Now ENOENT remains the "first-run" OK case, but every other errno (EACCES, ENOTDIR, EIO, …) sets *out_sym_err = RAY_ERR_IO so persistent consumers can see the underlying I/O failure instead of coming up with a silent empty sym table. * test/test_runtime.c (new): targeted tests for the persistent- consumer surface. /runtime/create_with_sym_absent_is_ok verifies ENOENT stays RAY_OK. /runtime/create_with_sym_io_error_surfaces constructs an ENOTDIR path (creates a regular file then asks for a subpath under it) and asserts out_sym_err becomes non-RAY_OK. /runtime/create_with_sym_plain_variant_absent exercises the non-_err variant with a missing sym file. test_main.c wires the new suite into the root. 704/704 pass on debug (ASan+UBSan) and release.
Five comments from 2026-04-23T12:12: * dl_filter_eq SYM correctness (src/ops/datalog.c:~934): the constant- filter path read key columns as int64_t*, which miscompares and overreads against adaptive-width RAY_SYM vectors (W8/W16/W32). New dl_col_eq_row helper dispatches on type and uses ray_read_sym for RAY_SYM; the output-column copy is element-size-aware (memcpy of esz bytes per surviving row, allocator picks sym_vec_new for SYM so the narrow width is preserved). Non-numeric/non-sym key columns fall through unfiltered — matches the existing pass-through refcount convention. Also propagates str_pool when the filtered column is RAY_STR. * Scalar aggregate allocation failure (src/ops/datalog.c:~1594): a NULL or RAY_ERROR from ray_vec_new previously did `break`, silently leaving agg_target_var unbound and letting evaluation continue with a half-constructed rule (and leaking the error block). Now release accum, free the error via ray_error_free, set prog->eval_err, and return NULL so dl_eval reports failure. * dl_project empty-accum SYM width (src/ops/datalog.c:~1037): the empty-accum fallback built ecol via ray_vec_new(RAY_SYM, 0) which always lands at W64 — mismatching a head relation that uses narrower SYM attrs. Mirror hcol->attrs width via ray_sym_vec_new(..., 0) when the head-rel column is RAY_SYM so downstream table_union sees a matching schema instead of tripping the column-count guard. * /runtime/create_with_sym_io_error_surfaces now pins the exact code (RAY_ERR_IO) — drifting the errno->code mapping will fail loudly instead of sliding by on a "not RAY_OK" check. * Two new targeted runtime tests: /runtime/create_with_sym_corrupt_file writes garbage bytes to the sym path and asserts out_sym_err goes non-RAY_OK (ray_sym_load flags RAY_ERR_CORRUPT on header validation). /runtime/create_with_sym_load_preserves_user_ids exercises the whole persistence promise end-to-end: intern "rayforce-user-marker", ray_sym_save, destroy, reload via ray_runtime_create_with_sym_err, re-intern the same string and assert the ID is stable. (Oversized- file / RAY_ERR_OOM isn't tested because mem_budget is a large fraction of RAM and can't be overridden from a test without a dependency-injection hook — noted in the commit but not blocking.) 706/706 pass on debug (ASan+UBSan) and release.
dl_filter_eq's pass-through branches (NULL input, RAY_IS_ERR input,
empty-rows input, missing key column, non-I64/non-SYM key column) used
to return the input pointer without bumping its refcount. Combined
with the caller idiom
ray_retain(body_tbl);
ray_t* filtered = dl_filter_eq(body_tbl, c, const);
ray_release(body_tbl);
body_tbl = filtered;
this meant the pass-through paths produced a `filtered` whose refcount
was *one less* than the caller assumed. Later `ray_release(body_tbl)`
calls (in dl_compile_rule's join paths and the DL_NEG antijoin) could
then drop the underlying rel->table to rc=0 and free it out from under
the program's own EDB storage.
Tighten the contract so every exit from dl_filter_eq returns an owned
reference:
* empty-rows short-circuit: retain before return.
* missing-column short-circuit: retain before return.
* non-I64/non-SYM key pass-through: retain before return.
* RAY_IS_ERR input: retain iff non-NULL (caller will see RAY_IS_ERR
and free via ray_error_free).
The count==nrows branch already retained. Callers are unchanged; the
fix simply matches their existing expectations. 706/706 pass on debug
(ASan+UBSan) and release.
…warnings Two datalog paths were writing diagnostic messages to stderr from non-debug builds instead of surfacing the failure through the normal error channel: * dl_idb_align_head_const_types (src/ops/datalog.c): head-const type conflict between rules used to `fprintf(stderr, ...)` and return without flagging the program. Removed the stderr write and set prog->eval_err = true instead. dl_eval now short-circuits when eval_err is already set at entry (sticky flag), so the conflict detected at rule-add time reaches dl_eval's return without being reset. Callers like ray_query_fn turn that -1 into a proper "query: evaluation failed" ray_error. * dl_compile_rule grouped-aggregate + positive body atoms (nyi path): same treatment — remove stderr warning, set prog->eval_err. Regression test /datalog/rule_head_const_type_conflict builds a program with two rules whose head-const types collide on the same slot and asserts dl_eval returns -1. 707/707 pass on debug (ASan+UBSan) and release.
Seven distinct comments across the 13:35 / 13:54 review rounds:
* Aggregate-only rule fallback (datalog.c:~1276): one_val / accum /
add_col allocation failures previously returned NULL without
setting eval_err or freeing RAY_ERROR blocks. All paths now set
prog->eval_err = true and route errors through ray_error_free().
* Grouped-aggregate compile (datalog.c:~1414..1468): added NULL
checks around ray_graph_new, ray_sym_str, every ray_scan, ray_group,
and ray_execute. On any failure, free the sub-graph, release
accum, set eval_err, and return NULL. Execute errors that were
RAY_ERROR are now freed via ray_error_free instead of leaking.
* DL_CMP type awareness (datalog.c:~1746): comparison loop used to
read both sides as int64_t*, miscomparing RAY_F64 columns produced
by dl_eval_expr's float path. Now promotes to f64 iff either side
is f64 and rejects non-numeric sources with eval_err. Filter-copy
loop switched to element-size-aware memcpy so f64/narrow-SYM
columns survive the mask unchanged, with str_pool propagation for
RAY_STR.
* table_union concat (datalog.c:~2055): propagate the original
ray_vec_concat error (e.g. "type" for schema mismatch) instead of
always rewriting it to a generic "memory" error.
* dl_filter_eq (datalog.c:~979): added proper error checks for
ray_table_new, ray_vec_new, and ray_table_add_col. Previously a
failure in any of those silently `continue`-d past the column,
yielding a partial table.
* Corrupt-file test (test_runtime.c): the fake-garbage payload was
tripping ray_col_load's header parser (RAY_ERR_NYI) before the
sym-table-specific corrupt path. Now writes STR_LIST_MAGIC +
inflated string count so col_load_str_list hits its truncated-
body check, which returns ray_error("corrupt", ...) — mapping
back to the exact RAY_ERR_CORRUPT we assert.
* Oversized-file OOM test (test_runtime.c): new
/runtime/create_with_sym_oversized_file uses ftruncate to produce
a 10 EB sparse sym file and asserts out_sym_err == RAY_ERR_OOM.
Filesystems that reject the giant ftruncate (tmpfs) return
MUNIT_SKIP rather than a spurious failure.
* test_datalog.c: added <stdio.h>/<stdlib.h> so fprintf/abort() in
datalog_rf_setup have explicit declarations under -Werror (they
were arriving transitively before but the coverage was fragile).
708/708 pass on debug (ASan+UBSan) and release.
Round-6 made dl_filter_eq surface allocation/add_col failures as RAY_ERROR returns, but the two call sites in dl_compile_rule were still blindly assigning the return to body_tbl / neg_tbl and continuing. If the filter failed, the subsequent join path would then operate on a RAY_ERROR, or hit a NULL — either way turning a proper failure into undefined behavior. Both call sites (positive DL_POS body filter at ~1199 and DL_NEG filter at ~1353) now treat a NULL or RAY_ERROR return as a hard failure: release the surviving accum, free any error block via ray_error_free, set prog->eval_err = true, and return NULL from dl_compile_rule so dl_eval reports -1. 708/708 pass on debug (ASan+UBSan) and release.
…l_err Four comments from 2026-04-23T14:21: * dl_filter_eq / table_union / dl_project: ray_table_add_col failures used to return without releasing the partially-built output table. ray_table_add_col doesn't free its input on error, so this leaked the table plus every previously-added column. All four add_col sites (dl_filter_eq, table_union, dl_project — var path and both const/empty paths) now release `out` before returning the error (or return the RAY_ERROR directly when that's the most informative code). * dl_project constant-SYM broadcast width: dl_broadcast_const_col used ray_vec_new(RAY_SYM, …) which always produced W64. When a prior rule had already aligned the head relation's SYM column to a narrower width, the W64 broadcast would then fail ray_vec_concat inside table_union. Added a sym_width_hint parameter routed through ray_sym_vec_new, and in dl_project the caller now inspects the existing head_rel column's attrs and passes the matching width. * dl_idb_align_head_const_types: OOM / add_col failures while rebuilding the IDB's empty schema used to silently `return`, leaving the relation with its original column types and eventually causing a ray_vec_concat mismatch during evaluation. Every failure path now sets prog->eval_err = true (and still frees any RAY_ERROR via ray_error_free), so dl_eval reliably returns -1 when alignment can't complete. 708/708 pass on debug (ASan+UBSan) and release.
Round-8's add_col-leak fix covered the var-slot path (src_idx >= 0) and the empty-accum fallback but missed the third add_col site — the constant-head-slot path right below them. If ray_table_add_col fails there (NULL or RAY_ERR) the partially-built `out` was still being returned-without-release, leaking the table and every column already added. Applied the same release-before-return treatment. 708/708 pass on debug (ASan+UBSan) and release.
Six comments from 2026-04-23T15:11: * Aggregate-only rule fallback (datalog.c:~1374): ray_table_add_col failure left `accum` leaked. Release it on both error exits. * DL_CMP dl_eval_expr failure (datalog.c:~1855): previously a NULL or RAY_ERROR from dl_eval_expr did `break`, silently skipping the comparison filter and letting the rule produce wrong rows. Treat eval failure as unrecoverable: free any RAY_ERROR, release accum, set prog->eval_err, and return NULL from dl_compile_rule. Same for the missing-variable-column paths. * DL_CMP filtered-table build (datalog.c:~1982): `continue`-ing past a failed column allocation / add_col produced a table with fewer columns than accum — silent schema corruption. Now validates ray_table_new, ray_vec_new, ray_sym_vec_new, and ray_table_add_col at every step; any failure releases the partial output and raises prog->eval_err. * Public API backward compat (datalog.h:252): dl_rule_head_const's signature was extended from (rule, pos, val) to (rule, pos, val, type) in this PR, breaking external callers. Restored the 3-arg form as an I64 wrapper and added dl_rule_head_const_typed for the typed variant. Internal call sites and tests updated to the new name. * Plan doc absolute paths (docs/plans/...md): replaced three /Users/aspirational/... paths with $WORKSPACE-relative references so the doc is portable across dev environments / CI. * Oversized-file OOM test (test_runtime.c:~208): 32-bit off_t shift was undefined. Skip when sizeof(off_t) < 8 and build the size via int64_t before casting to off_t. 708/708 pass on debug (ASan+UBSan) and release.
Round-9 covered DL_CMP's silent-break paths but missed DL_ASSIGN, which
had the same pattern:
if (!new_col || RAY_IS_ERR(new_col)) break;
A failed assignment silently leaves assign_var unbound; subsequent body
literals keep compiling with stale bindings, and dl_eval returns 0 with
a wrong result set. Now treat every failure mode as unrecoverable: free
any RAY_ERROR via ray_error_free, release accum, set prog->eval_err, and
return NULL. Also extended the check to the dl_table_add_computed_col
return — previously unchecked.
708/708 pass on debug (ASan+UBSan) and release.
Three comments from 2026-04-23T15:50: * table_union propagates error operands (datalog.c:~2203): when `b` was a RAY_ERROR the function used to return `a` instead, silently masking the real failure. Now both error operands are returned (retained) so the caller sees the true diagnostic, and NULL operands still fall back to the other side. ray_retain is a no-op on errors, so the "owned return" contract is preserved. * dl_broadcast_const_col width sentinel (datalog.c:~1054): the old uint8_t sym_width_hint used 0 as "default", but RAY_SYM_W8 is also 0 — so an aligned IDB with a W8 head-const column would silently widen to W64, later tripping ray_vec_concat. Replaced the hint with a `const ray_t* width_template` pointer: NULL means "default W64", otherwise copy the template column's adaptive width directly. Caller (dl_project) updated to pass the head_rel column pointer. * test_runtime.c: replaced the manual runtime-API forward decls with `#include "core/runtime.h"` now that this TU doesn't pull in lang/eval.h — the ray_vm_t duplication concern from test_datalog.c doesn't apply here, so let the real header be the source of truth. 708/708 pass on debug (ASan+UBSan) and release.
|
@copilot resolve the merge conflicts in this pull request |
425ef1a to
23a0cb5
Compare
What this PR gives you
Write smarter rules without dropping to C. Before this PR, Datalog rules in rayforce2 could only do joins and negation. Now you can:
(count ?n items),(sum ?total invoice value),(avg ?mean sensor reading by ?region col). Grouped or scalar, with the semantics you'd expect: empty groups produce no rows for min/max/avg, and zero for count/sum.(= ?bmi (/ ?weight (* ?height ?height)))just works, even when some columns are integers and others are floats. Results auto-promote to f64 when needed.(between ?x 18 65)instead of two separate comparison literals.(rule (band "small") (weight ?w) (< ?w 60))derives labeled categories directly. String, integer, and float constants all work, including across dependent rules (IDB-to-IDB).These features together let consumers like ray-exomem replace procedural Rust derivation code with declarative Datalog rules — simpler to write, easier to reason about, and evaluated by the engine's optimizer.
Persistent-consumer support:
ray_runtime_create_with_symWhy it was added
The existing
ray_runtime_createinterns ~175 builtin symbol names (+,-,count,rule, etc.) at IDs 0..N duringray_lang_init. For long-lived consumers that persist data across process restarts, this causes a chicken-and-egg problem:ray_sym_load(user_file)now tries to restore user strings at their saved positions, but positions 0..174 are already occupied by builtins — divergence detection returnsRAY_ERR_CORRUPT.Upstream-only callers never hit this because they don't persist sym state across runs. Persistent consumers (databases, daemons, anything that reloads splayed tables on startup) need the sym table loaded before builtins claim slots — so that persisted user IDs keep their positions and new builtins append afterwards.
API
ray_runtime_create_with_sym(path)— loadspathif it exists, ignores errors (convenient for callers that don't need to distinguish failure modes).ray_runtime_create_with_sym_err(path, &out_err)— returns the load result via out-param so callers can distinguishRAY_OK,RAY_ERR_CORRUPT, and I/O failures, and decide recovery policy.Original
ray_runtime_create(argc, argv)is unchanged — no behavioral difference for existing callers.Hardening (latest commit)
__VMandmem_budgetare set beforeray_sym_loadruns, so file I/O errors surface throughray_error()(previously silently dropped) and allocations are bounded by the budget.stat()the sym file beforeray_col_load; reject ifst_size > mem_budget / 2to prevent a malicious or corrupted multi-GB file from OOMing the process before budget guards engage._errvariant surfaces load failures cleanly instead of collapsing them into a single pass/fail.Safety hardening (review feedback)
Test coverage
make testpass (ASan + UBSan)🤖 Generated with Claude Code
Post-review hardening (commits since
4465c88c)A wave of follow-up fixes by @singaraiona addressed two rounds of Copilot review plus a broader silent-failure audit. Summarized by theme:
Silent-failure eradication in
dl_evaland friendsThe biggest correctness gain. Before these commits the datalog runtime could return "success with no rows" on genuine compile/evaluation failures — query consumers saw
0fromdl_evaleither way.dl_evalnow returns-1on failure via a program-scopedeval_errflag flipped at every failure site indl_compile_ruleand the runtime pipeline (dl_project OOM/type/add_col, grouped-agg key/value OOR, scalar-agg OOR, non-numeric value,ray_execute/table_rename_cols/table_union/table_distincterrors). Legitimate empty rules still return0. (d440773240b3)continue-without-flag paths that bypassed eval_err — Phase A/Bray_graph_newOOM, union accumulator,table_distinct/table_antijoinper-IDB merge, delta-into-relation union. All now flag. (25a494f47a13)table_unionno longer silently skips columns whose concat failed, no longer narrows tomin(ncols_a, ncols_b), and the schema-mismatch check runs before the empty-rows short-circuit. All three failures now return typedray_error. (8dbd16087cd1,52d374c2dd12,b9ffe63267d3)table_distinct/table_antijoincanonicalize/graph_new failures returnray_errorinstead of falling back to the raw input (which may not be distinct, or isn't the antijoin of anything). (8dbd16087cd1)dl_projectpropagates errors instead ofcontinue-ing past the failed slot, and now falls back to the IDB's existing column types when accum collapses to 0 rows with no schema (e.g. fully-filtered antijoin) to preservetable_unionschema agreement. (039496e8a5f8)Aggregate and auto-register correctness
RAY_F64: previously treated any non-RAY_I64as "return 0"; now i64→i64, f64→f64, non-numeric rejected. (510bf600e02c)nrows > 0branch, so empty f64 sources emittedRAY_I64 0instead ofRAY_F64 0.0. Lifted above the split. (15f01bf3f55d)value_colbounds check moved above the empty-source early return so OOR rejects regardless of row count. (7af5b693521e)dl_add_edbreturns0, 1, 2, …, not just0; the!= 0check rejected every EDB beyond the primary one. Fixed to< 0, so multi-EDB rule bodies actually work. (ffbd67924d6c)prog == NULL(surface syntax, globals) the parser hardcodedpred_arity=1, so env-bound tables with any other arity were silently skipped during auto-register. Now uses0as an "unknown at parse time" sentinel. (039496e8a5f8)group_key_colandvalue_colare bounds-checked against source arity before indexingcol_names[]; out-of-range key previously OOB-read, out-of-range value was silently clamped to col 0. (510bf600e02c)ray_table_get_col_idxused tocontinue, producing a cleaned table with fewer thanpred_aritycolumns thatdl_add_edbstill registered. Now returns a schema error. (8f79c9430dfc)Narrow-width SYM projection and STR pool propagation
dl_projectSYM width:ray_vec_new(RAY_SYM, …)always returns W64, somemcpywith the source's narrower stride left upper bytes uninit and produced bogus sym IDs. Now routes SYM columns throughray_sym_vec_new(src->attrs & RAY_SYM_W_MASK, …). (510bf600e02c)dl_projectSTR pool: the memcpy path forRAY_STRcopied the 16-byteray_str_thandles but leftdst->str_poolNULL, dereferencingpool_offinto a NULL pool on reads of strings >12 bytes. Now callscol_propagate_str_pool(dst, src). (ce9aef53eb76)Error-block memory lifetime
ray_release,ray_retain, andray_freeall short-circuit onRAY_IS_ERR, so everyif (RAY_IS_ERR(x)) ray_release(x);site was quietly leaking the error block.ray_error_free(ray_t*)published ininclude/rayforce.h— retypes the error to a leaf atom so the standard free path reclaims it. Wired at ~15 sites acrossdl_idb_align_head_const_types,dl_project,dl_compile_rule,dl_evalPhase A/B,ray_query_fnenv auto-register,dl_col_as_f64/dl_eval_expr. Regression test/datalog/error_free_reclaimsloops 256 create/free cycles and assertsray_mem_stats()free_countadvances andbytes_allocatedstays flat. (ed7520e2a1eb,ce9aef53eb76)table_unionpass-through now retainsbeven when it's aRAY_ERROR, so callers that release the returned pointer don't alias freed memory. (8f79c9430dfc)Persistent-consumer runtime surface polish
ray_runtime_create_with_sym/_with_sym_errprototypes published insrc/core/runtime.h— previously defined inruntime.cbut undeclared, forcing consumers to hand-duplicate the signature. (8f79c9430dfc)stat()errno handling inray_runtime_create_with_sym_err:ENOENT(fresh install) staysRAY_OK; every other errno (EACCES,ENOTDIR,EIO, …) now sets*out_sym_err = RAY_ERR_IOinstead of silently producing an empty sym table. (ce9aef53eb76)test/test_runtime.ccovers the persistent-consumer surface: absent-is-ok, ENOTDIR-surfaces-error, plain-variant behavior. (ce9aef53eb76)Test count
693/693→704/704(11 new tests), green on debug (ASan+UBSan) and release.