Skip to content
This repository was archived by the owner on Apr 30, 2026. It is now read-only.

feat(datalog): aggregates, float constants, between sugar, and typed head constants#7

Merged
singaraiona merged 51 commits into
RayforceDB:masterfrom
theaspirational:feature/datalog-aggregates
Apr 24, 2026
Merged

feat(datalog): aggregates, float constants, between sugar, and typed head constants#7
singaraiona merged 51 commits into
RayforceDB:masterfrom
theaspirational:feature/datalog-aggregates

Conversation

@theaspirational
Copy link
Copy Markdown
Contributor

@theaspirational theaspirational commented Apr 19, 2026

What this PR gives you

Write smarter rules without dropping to C. Before this PR, Datalog rules in rayforce2 could only do joins and negation. Now you can:

  • Aggregate across rows(count ?n items), (sum ?total invoice value), (avg ?mean sensor reading by ?region col). Grouped or scalar, with the semantics you'd expect: empty groups produce no rows for min/max/avg, and zero for count/sum.
  • Use float constants and mixed arithmetic(= ?bmi (/ ?weight (* ?height ?height))) just works, even when some columns are integers and others are floats. Results auto-promote to f64 when needed.
  • Write range filters concisely(between ?x 18 65) instead of two separate comparison literals.
  • Put constants in rule heads(rule (band "small") (weight ?w) (< ?w 60)) derives labeled categories directly. String, integer, and float constants all work, including across dependent rules (IDB-to-IDB).
  • Reference any env-bound table in rule bodies — no need to pre-register each EDB by hand. If a table is bound in the ray env, rules can query it automatically.

These features together let consumers like ray-exomem replace procedural Rust derivation code with declarative Datalog rules — simpler to write, easier to reason about, and evaluated by the engine's optimizer.

Persistent-consumer support: ray_runtime_create_with_sym

Why it was added

The existing ray_runtime_create interns ~175 builtin symbol names (+, -, count, rule, etc.) at IDs 0..N during ray_lang_init. For long-lived consumers that persist data across process restarts, this causes a chicken-and-egg problem:

  1. User asserts a fact → interns a string → gets sym ID, say 200.
  2. User's column data stores that fact with sym ID 200.
  3. Process restarts. Builtins register at 0..174 first, claiming those slots.
  4. ray_sym_load(user_file) now tries to restore user strings at their saved positions, but positions 0..174 are already occupied by builtins — divergence detection returns RAY_ERR_CORRUPT.

Upstream-only callers never hit this because they don't persist sym state across runs. Persistent consumers (databases, daemons, anything that reloads splayed tables on startup) need the sym table loaded before builtins claim slots — so that persisted user IDs keep their positions and new builtins append afterwards.

API

  • ray_runtime_create_with_sym(path) — loads path if it exists, ignores errors (convenient for callers that don't need to distinguish failure modes).
  • ray_runtime_create_with_sym_err(path, &out_err) — returns the load result via out-param so callers can distinguish RAY_OK, RAY_ERR_CORRUPT, and I/O failures, and decide recovery policy.

Original ray_runtime_create(argc, argv) is unchanged — no behavioral difference for existing callers.

Hardening (latest commit)

  • Init ordering__VM and mem_budget are set before ray_sym_load runs, so file I/O errors surface through ray_error() (previously silently dropped) and allocations are bounded by the budget.
  • Size pre-flightstat() the sym file before ray_col_load; reject if st_size > mem_budget / 2 to prevent a malicious or corrupted multi-GB file from OOMing the process before budget guards engage.
  • Error transparency — the _err variant surfaces load failures cleanly instead of collapsing them into a single pass/fail.

Safety hardening (review feedback)

  • Expression evaluator rejects non-numeric (RAY_SYM) columns instead of assuming 8-byte elements — RAY_SYM has adaptive width and the old memcpy would overread on narrow dictionaries.
  • Head-const type conflicts fail fast when two rules for the same predicate disagree on a slot's type, instead of silently letting the last rule win.
  • Grouped aggregate bounds — group key column indices are now validated against the source relation's arity.
  • Error-path leak in IDB type alignment fixed (table pointer was lost when ray_table_add_col failed).

Test coverage

  • 693 / 693 make test pass (ASan + UBSan)
  • 9 head-constant tests: single-rule, i64, f64, cross-IDB, with-agg, negation, stratification, surface syntax
  • Aggregate: scalar (count/sum/min/max/avg), grouped (count/sum/min/max/avg), empty-source (all 5)
  • Aggregate parsing: scalar + grouped surface syntax, 4 reject paths
  • Float arithmetic + mixed i64/f64 promotion
  • Between sugar parse
  • Env-bound EDB auto-registration: end-to-end through ray_query_fn

🤖 Generated with Claude Code


Post-review hardening (commits since 4465c88c)

A wave of follow-up fixes by @singaraiona addressed two rounds of Copilot review plus a broader silent-failure audit. Summarized by theme:

Silent-failure eradication in dl_eval and friends

The biggest correctness gain. Before these commits the datalog runtime could return "success with no rows" on genuine compile/evaluation failures — query consumers saw 0 from dl_eval either way.

  • dl_eval now returns -1 on failure via a program-scoped eval_err flag flipped at every failure site in dl_compile_rule and the runtime pipeline (dl_project OOM/type/add_col, grouped-agg key/value OOR, scalar-agg OOR, non-numeric value, ray_execute / table_rename_cols / table_union / table_distinct errors). Legitimate empty rules still return 0. (d440773240b3)
  • Phase B of the semi-naive loop had several continue-without-flag paths that bypassed eval_err — Phase A/B ray_graph_new OOM, union accumulator, table_distinct / table_antijoin per-IDB merge, delta-into-relation union. All now flag. (25a494f47a13)
  • table_union no longer silently skips columns whose concat failed, no longer narrows to min(ncols_a, ncols_b), and the schema-mismatch check runs before the empty-rows short-circuit. All three failures now return typed ray_error. (8dbd16087cd1, 52d374c2dd12, b9ffe63267d3)
  • table_distinct / table_antijoin canonicalize/graph_new failures return ray_error instead of falling back to the raw input (which may not be distinct, or isn't the antijoin of anything). (8dbd16087cd1)
  • dl_project propagates errors instead of continue-ing past the failed slot, and now falls back to the IDB's existing column types when accum collapses to 0 rows with no schema (e.g. fully-filtered antijoin) to preserve table_union schema agreement. (039496e8a5f8)

Aggregate and auto-register correctness

  • Scalar SUM/MIN/MAX/AVG over RAY_F64: previously treated any non-RAY_I64 as "return 0"; now i64→i64, f64→f64, non-numeric rejected. (510bf600e02c)
  • Empty-source F64 SUM identity: the type check was inside the nrows > 0 branch, so empty f64 sources emitted RAY_I64 0 instead of RAY_F64 0.0. Lifted above the split. (15f01bf3f55d)
  • Scalar-agg value_col bounds check moved above the empty-source early return so OOR rejects regardless of row count. (7af5b693521e)
  • Env-bound EDB auto-register: dl_add_edb returns 0, 1, 2, …, not just 0; the != 0 check rejected every EDB beyond the primary one. Fixed to < 0, so multi-EDB rule bodies actually work. (ffbd67924d6c)
  • Agg parser arity: when prog == NULL (surface syntax, globals) the parser hardcoded pred_arity=1, so env-bound tables with any other arity were silently skipped during auto-register. Now uses 0 as an "unknown at parse time" sentinel. (039496e8a5f8)
  • Grouped-agg bounds: group_key_col and value_col are bounds-checked against source arity before indexing col_names[]; out-of-range key previously OOB-read, out-of-range value was silently clamped to col 0. (510bf600e02c)
  • Env auto-register missing-column rejection: a NULL from ray_table_get_col_idx used to continue, producing a cleaned table with fewer than pred_arity columns that dl_add_edb still registered. Now returns a schema error. (8f79c9430dfc)

Narrow-width SYM projection and STR pool propagation

  • dl_project SYM width: ray_vec_new(RAY_SYM, …) always returns W64, so memcpy with the source's narrower stride left upper bytes uninit and produced bogus sym IDs. Now routes SYM columns through ray_sym_vec_new(src->attrs & RAY_SYM_W_MASK, …). (510bf600e02c)
  • dl_project STR pool: the memcpy path for RAY_STR copied the 16-byte ray_str_t handles but left dst->str_pool NULL, dereferencing pool_off into a NULL pool on reads of strings >12 bytes. Now calls col_propagate_str_pool(dst, src). (ce9aef53eb76)

Error-block memory lifetime

ray_release, ray_retain, and ray_free all short-circuit on RAY_IS_ERR, so every if (RAY_IS_ERR(x)) ray_release(x); site was quietly leaking the error block.

  • New ray_error_free(ray_t*) published in include/rayforce.h — retypes the error to a leaf atom so the standard free path reclaims it. Wired at ~15 sites across dl_idb_align_head_const_types, dl_project, dl_compile_rule, dl_eval Phase A/B, ray_query_fn env auto-register, dl_col_as_f64 / dl_eval_expr. Regression test /datalog/error_free_reclaims loops 256 create/free cycles and asserts ray_mem_stats() free_count advances and bytes_allocated stays flat. (ed7520e2a1eb, ce9aef53eb76)
  • table_union pass-through now retains b even when it's a RAY_ERROR, so callers that release the returned pointer don't alias freed memory. (8f79c9430dfc)

Persistent-consumer runtime surface polish

  • ray_runtime_create_with_sym / _with_sym_err prototypes published in src/core/runtime.h — previously defined in runtime.c but undeclared, forcing consumers to hand-duplicate the signature. (8f79c9430dfc)
  • stat() errno handling in ray_runtime_create_with_sym_err: ENOENT (fresh install) stays RAY_OK; every other errno (EACCES, ENOTDIR, EIO, …) now sets *out_sym_err = RAY_ERR_IO instead of silently producing an empty sym table. (ce9aef53eb76)
  • test/test_runtime.c covers the persistent-consumer surface: absent-is-ok, ENOTDIR-surfaces-error, plain-variant behavior. (ce9aef53eb76)

Test count

693/693704/704 (11 new tests), green on debug (ASan+UBSan) and release.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the rayforce2 Datalog engine with aggregate body literals, float constants in expressions, between surface-syntax sugar, and typed head constants (including constant-slot broadcast projection across IDBs), plus associated integration and unit tests.

Changes:

  • Add DL_AGG body literals (scalar + grouped) with stratification support and evaluation in dl_compile_rule.
  • Add float constants to the assignment expression AST (DL_EXPR_CONST_F64) with mixed i64/f64 promotion.
  • Extend surface syntax parsing to support aggregates, between, typed head constants, and string constants in body positions; add extensive tests.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
test/test_datalog.c Adds many new Datalog feature tests, including surface-syntax runtime-driven cases.
src/ops/datalog.h Extends public Datalog API/types for aggregates, float constants, and typed head constants.
src/ops/datalog.c Implements aggregate compilation/eval, float expression support, typed constant projection, parser updates, and env-bound EDB auto-registration.
docs/plans/2026-04-18-datalog-aggregates-and-onboarding.md Adds the implementation plan document referenced by the PR.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/ops/datalog.c Outdated
Comment on lines +193 to +199
if (!fresh || RAY_IS_ERR(fresh)) return;
for (int c = 0; c < rel->arity; c++) {
ray_t* empty_col = ray_vec_new(desired[c], 0);
if (!empty_col || RAY_IS_ERR(empty_col)) { ray_release(fresh); return; }
fresh = ray_table_add_col(fresh, rel->col_names[c], empty_col);
ray_release(empty_col);
if (RAY_IS_ERR(fresh)) return;
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dl_idb_align_head_const_types rebuilds fresh and then returns early on RAY_IS_ERR(fresh) without releasing the error object / partially-built table. This can leak and also leaves the IDB schema in an indeterminate state for the remainder of compilation. Ensure that any error return releases fresh (and ideally leaves rel->table unchanged) before returning.

Suggested change
if (!fresh || RAY_IS_ERR(fresh)) return;
for (int c = 0; c < rel->arity; c++) {
ray_t* empty_col = ray_vec_new(desired[c], 0);
if (!empty_col || RAY_IS_ERR(empty_col)) { ray_release(fresh); return; }
fresh = ray_table_add_col(fresh, rel->col_names[c], empty_col);
ray_release(empty_col);
if (RAY_IS_ERR(fresh)) return;
if (!fresh) return;
if (RAY_IS_ERR(fresh)) { ray_release(fresh); return; }
for (int c = 0; c < rel->arity; c++) {
ray_t* empty_col = ray_vec_new(desired[c], 0);
if (!empty_col || RAY_IS_ERR(empty_col)) { ray_release(fresh); return; }
fresh = ray_table_add_col(fresh, rel->col_names[c], empty_col);
ray_release(empty_col);
if (RAY_IS_ERR(fresh)) { ray_release(fresh); return; }

Copilot uses AI. Check for mistakes.
Comment thread src/ops/datalog.c
Comment on lines +175 to +189
bool any_change = false;
int8_t desired[DL_MAX_ARITY];
for (int c = 0; c < rel->arity; c++) {
ray_t* col = ray_table_get_col_idx(rel->table, c);
int8_t cur = col ? col->type : RAY_I64;
int8_t want = rule->head_const_types[c];
if (want == 0) {
/* No constant hint for this slot — keep current type. */
desired[c] = cur;
} else {
desired[c] = want;
if (want != cur) any_change = true;
}
}
if (!any_change) return;
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dl_idb_align_head_const_types can rebuild the same IDB table multiple times while the relation is still empty, meaning the final column types depend on rule-add order. If two rules for the same head predicate specify conflicting head_const_types for a slot, evaluation will later fail when projecting/unioning mismatched column types. Consider tracking a per-relation “committed” column type per slot (first non-zero wins) and failing fast on conflicts instead of silently re-aligning.

Copilot uses AI. Check for mistakes.
Comment thread test/test_datalog.c Outdated
Comment on lines +36 to +65
@@ -44,6 +52,18 @@ static void datalog_teardown(void* fixture) {
ray_heap_destroy();
}

/* Full runtime — required for ray_eval_str("(rule ...)") surface-syntax tests. */
static void* datalog_rf_setup(const void* params, void* user_data) {
(void)params; (void)user_data;
ray_runtime_create(0, NULL);
return NULL;
}

static void datalog_rf_teardown(void* fixture) {
(void)fixture;
ray_runtime_destroy(__RUNTIME);
}
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runtime API is re-declared manually here (forward-declared struct + externs). This is brittle if the signature or globals change, and the setup doesn’t check ray_runtime_create() for failure before dereferencing __RUNTIME in teardown. Prefer including the owning header (e.g. core/runtime.h) and asserting the returned runtime pointer is non-NULL, then pass that pointer to ray_runtime_destroy().

Copilot uses AI. Check for mistakes.
Comment thread src/ops/datalog.c
Comment on lines 628 to 638
case DL_EXPR_VAR: {
int ci = var_col[expr->var_idx];
ray_t* src = ray_table_get_col_idx(accum, ci);
if (!src) return NULL;
ray_t* dst = ray_vec_new(RAY_I64, nrows);
int8_t t = (src->type == RAY_F64) ? RAY_F64 : RAY_I64;
size_t elem = (t == RAY_F64) ? sizeof(double) : sizeof(int64_t);
ray_t* dst = ray_vec_new(t, nrows);
if (!dst || RAY_IS_ERR(dst)) return NULL;
dst->len = nrows;
memcpy(ray_data(dst), ray_data(src), (size_t)nrows * sizeof(int64_t));
memcpy(ray_data(dst), ray_data(src), (size_t)nrows * elem);
return dst;
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In dl_eval_expr's DL_EXPR_VAR case, any non-RAY_F64 source column is treated as RAY_I64 and copied with a fixed 8-byte element size. If a variable ever binds to a non-numeric column (e.g., RAY_SYM or other types), this will reinterpret data and can overread/mis-copy when the underlying element size differs. Consider explicitly rejecting non-(RAY_I64|RAY_F64) sources (return an error) or copying with ray_sym_elem_size(src->type, src->attrs) and preserving the source type where appropriate.

Copilot uses AI. Check for mistakes.
Comment thread src/ops/datalog.c
Comment on lines +1264 to +1269
ray_op_t* keys_ops[DL_AGG_MAX_KEYS];
for (int i = 0; i < nk; i++) {
int64_t sym = src_rel->col_names[body->agg_group_key_cols[i]];
ray_t* s = ray_sym_str(sym);
keys_ops[i] = ray_scan(gg, ray_str_ptr(s));
}
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grouped aggregate compilation indexes src_rel->col_names[body->agg_group_key_cols[i]] without validating that each group_key_col is within [0, src_rel->arity). Since key column indices come from rule builders / surface syntax, an out-of-range value can cause out-of-bounds reads and crashes. Add bounds checks (and return a structured error) before using agg_group_key_cols, and similarly validate agg_value_col for the grouped case.

Copilot uses AI. Check for mistakes.
theaspirational added a commit to theaspirational/ray-exomem that referenced this pull request Apr 21, 2026
The sym/mmap/atomic-swap landings (ray-exomem commits 467b, 21dfca,
fc24a0) and today's JSONL cut depend on rayforce2 features that only
exist on feature/datalog-aggregates: ray_runtime_create_with_sym,
ray_read_splayed (mmap), and the head-const / auto-EDB projection the
bootstrap rules use.

The sibling ../rayforce2 path is unaffected — devs can still track
that branch locally. Only the fallback clone (used by `cargo install`
and CI) changes target branch.

Flip back to master after RayforceDB/rayforce2#7 merges.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
theaspirational added a commit to theaspirational/ray-exomem that referenced this pull request Apr 22, 2026
The sym/mmap/atomic-swap landings (ray-exomem commits 467b, 21dfca,
fc24a0) and today's JSONL cut depend on rayforce2 features that only
exist on feature/datalog-aggregates: ray_runtime_create_with_sym,
ray_read_splayed (mmap), and the head-const / auto-EDB projection the
bootstrap rules use.

The sibling ../rayforce2 path is unaffected — devs can still track
that branch locally. Only the fallback clone (used by `cargo install`
and CI) changes target branch.

Flip back to master after RayforceDB/rayforce2#7 merges.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@singaraiona singaraiona requested a review from Copilot April 22, 2026 16:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/ops/datalog.c Outdated
Comment on lines +1012 to +1019
ray_t* dst = ray_vec_new(src->type, nrows);
if (!dst || RAY_IS_ERR(dst)) continue;
dst->len = nrows;
/* Use element size from the source vec so SYM with any width,
* I64, and F64 all copy correctly. */
uint8_t esz = ray_sym_elem_size(src->type, src->attrs);
if (esz == 0) { ray_release(dst); continue; }
memcpy(ray_data(dst), ray_data(src), (size_t)nrows * (size_t)esz);
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dl_project allocates dst via ray_vec_new(src->type, nrows). For RAY_SYM columns this always creates a W64 vector, but the memcpy uses src->attrs element size (possibly 1/2/4). This under-copies and leaves garbage (and breaks sym reads) when src is a narrow SYM vec. Allocate dst with the same SYM width (e.g. ray_sym_vec_new(src->attrs & RAY_SYM_W_MASK, nrows) or the existing col_vec_new helper) or explicitly widen while writing via ray_read_sym/ray_write_sym.

Copilot uses AI. Check for mistakes.
Comment thread src/ops/datalog.c
Comment on lines +1266 to +1273
ray_graph_t* gg = ray_graph_new(src_table);
if (!gg) { ray_release(accum); return NULL; }

ray_op_t* keys_ops[DL_AGG_MAX_KEYS];
for (int i = 0; i < nk; i++) {
int kc = body->agg_group_key_cols[i];
if (kc < 0 || kc >= src_rel->arity) { ray_release(accum); return NULL; }
int64_t sym = src_rel->col_names[kc];
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grouped-aggregate compilation leaks the ray_graph_t gg on the early error path when a group key column index is out of range (it returns after allocating gg). Ensure gg is freed before returning on these validation failures, and consider also checking ray_scan results for NULL/error before proceeding.

Copilot uses AI. Check for mistakes.
Comment thread src/core/runtime.c
Comment on lines +208 to +231
if (sym_path) {
/* Pre-flight size check: reject files that would blow past the
* memory budget before ever touching ray_col_load. */
struct stat st;
if (stat(sym_path, &st) == 0) {
/* Allow the sym file itself plus some working headroom (2x).
* A well-formed sym file is a list of interned strings; the
* in-memory footprint is bounded by file size within a small
* constant factor. */
if (st.st_size > 0 &&
(int64_t)st.st_size > rt->mem_budget / 2) {
if (out_sym_err) *out_sym_err = RAY_ERR_OOM;
/* Continue startup with empty sym table; caller decides
* whether to treat this as fatal. */
} else {
ray_err_t sym_err = ray_sym_load(sym_path);
if (out_sym_err) *out_sym_err = sym_err;
/* RAY_ERR_CORRUPT and I/O errors are non-fatal here:
* caller inspects out_sym_err to decide recovery. */
}
}
/* ENOENT and other stat failures: leave out_sym_err = RAY_OK;
* an absent sym file is the normal first-run case. */
}
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runtime_create_impl treats any stat() failure as “no sym file” (out_sym_err stays RAY_OK) and skips ray_sym_load entirely. That hides real I/O problems (e.g., EACCES) and defeats the whole "load sym before builtins" guarantee for persistent consumers. Consider distinguishing ENOENT from other errors and either (a) attempt ray_sym_load directly and rely on its error codes, or (b) set out_sym_err appropriately for non-ENOENT stat failures.

Copilot uses AI. Check for mistakes.
Comment thread src/ops/datalog.c Outdated
Comment thread src/core/runtime.c
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/ops/datalog.c Outdated
Comment on lines +1358 to +1363
if (src_nrows > 0) {
ray_t* val_col =
ray_table_get_col_idx(src_table, body->agg_value_col);
if (!val_col || val_col->type != RAY_I64) {
result = 0;
} else {
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scalar SUM/MIN/MAX/AVG aggregation only handles RAY_I64 value columns; if the source column is RAY_F64 (which is now more common with float constants / mixed arithmetic), the code silently returns 0 instead of a correct result or an explicit error. Consider supporting both RAY_I64 and RAY_F64 here (mirroring the grouped path via ray_group), or returning a type error when the value column isn't numeric.

Copilot uses AI. Check for mistakes.
Comment thread src/ops/datalog.c Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/ops/datalog.c Outdated
Comment on lines +1011 to +1028
if (!src) continue;
ray_t* dst = ray_vec_new(src->type, nrows);
if (!dst || RAY_IS_ERR(dst)) continue;
dst->len = nrows;
/* Use element size from the source vec so SYM with any width,
* I64, and F64 all copy correctly. */
uint8_t esz = ray_sym_elem_size(src->type, src->attrs);
if (esz == 0) { ray_release(dst); continue; }
memcpy(ray_data(dst), ray_data(src), (size_t)nrows * (size_t)esz);
out = ray_table_add_col(out, head_rel->col_names[c], dst);
ray_release(dst);
} else {
/* Constant head slot: materialize an owned broadcast column. */
int8_t ctype = head_const_types ? head_const_types[c] : 0;
if (ctype == 0) continue; /* legacy/unset */
ray_t* bcast = dl_broadcast_const_col(nrows, ctype, head_consts[c]);
if (!bcast || RAY_IS_ERR(bcast)) continue;
out = ray_table_add_col(out, head_rel->col_names[c], bcast);
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dl_project() currently treats allocation failures by continue-ing, which can (a) leak the RAY_ERROR object returned from ray_vec_new()/dl_broadcast_const_col() and (b) produce an output table missing columns (silently corrupting the derived relation schema). Instead, release any error objects and propagate a failure (return an error table) so callers can abort rule evaluation cleanly.

Copilot uses AI. Check for mistakes.
Comment thread src/ops/datalog.c Outdated
ray_op_t* keys_ops[DL_AGG_MAX_KEYS];
for (int i = 0; i < nk; i++) {
int kc = body->agg_group_key_cols[i];
if (kc < 0 || kc >= src_rel->arity) { ray_release(accum); return NULL; }
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the grouped-aggregate compile path, if a group key column index is out of range you early-return without freeing the sub-graph gg (created just above). This leaks the graph and any retained resources. Ensure gg is freed on all error paths after ray_graph_new(), including the out-of-range key-col case.

Suggested change
if (kc < 0 || kc >= src_rel->arity) { ray_release(accum); return NULL; }
if (kc < 0 || kc >= src_rel->arity) {
ray_graph_free(gg);
ray_release(accum);
return NULL;
}

Copilot uses AI. Check for mistakes.
Comment thread src/ops/datalog.c Outdated
Comment on lines +2917 to +2921
int pred_arity = 1;
if (prog) {
int ri = dl_find_rel(prog, pred_name);
if (ri >= 0) pred_arity = prog->rels[ri].arity;
}
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aggregate parsing sets pred_arity to 1 when prog is NULL. That value is later used by ray_query_fn's env-bound EDB auto-registration (it checks ncols != pred_arity for DL_AGG too), so aggregates over env-bound relations (or over any non-arity-1 predicate defined via surface syntax before a program exists) can fail to auto-register and then fail at compile time. Consider storing an 'unknown' arity (e.g. 0/-1) for DL_AGG when prog is NULL and relaxing the env auto-register arity check in that case, or resolving arity at compile time instead of parse time.

Copilot uses AI. Check for mistakes.
singaraiona pushed a commit to theaspirational/rayforce2 that referenced this pull request Apr 22, 2026
…lars

Three correctness fixes in src/ops/datalog.c flagged by the Copilot review on
PR RayforceDB#7, each with a matching regression test:

* dl_project (line ~1001): preserve SYM index width on the projected column.
  ray_vec_new(RAY_SYM, …) always returns a W64 vec, so memcpy'ing with the
  source's narrower element size left the upper bytes of each W64 slot
  uninitialized and produced bogus sym IDs. Route SYM columns through
  ray_sym_vec_new(src->attrs & RAY_SYM_W_MASK, …) so dst's stride matches src.

* Grouped aggregate compile (line ~1269): bounds-check each group_key_col
  and agg_value_col against src_rel->arity before indexing col_names[].
  An out-of-range key would OOB-read; an out-of-range value was silently
  clamped to column 0, producing correct-looking but wrong results. Also
  ensure ray_graph_free(gg) runs on every error return after gg is allocated
  (previously leaked on the out-of-range path).

* Scalar SUM/MIN/MAX/AVG (line ~1347): accept RAY_F64 value columns. The
  prior code treated any non-RAY_I64 as "return 0", so float-valued sources
  silently produced wrong results. New behaviour: i64 in → i64 out, f64 in
  → f64 out (AVG still always emits f64); non-numeric sources are rejected
  rather than zero'd.

Tests added:
* /datalog/agg_scalar_f64            — sum/avg over an f64 value column
* /datalog/agg_grouped_key_col_oor   — OOR group-key col rejected cleanly
* /datalog/project_narrow_sym        — rule passes a W8 SYM column through

make test (ASan+UBSan) and make release both green at 696/696.
singaraiona pushed a commit to theaspirational/rayforce2 that referenced this pull request Apr 23, 2026
Two follow-up fixes from the latest Copilot review round on PR RayforceDB#7:

* dl_project (src/ops/datalog.c): allocation failures used to `continue` past
  the failed slot, leaking the RAY_ERROR object and producing a derived table
  with silently-missing columns.  Each error path now releases any in-flight
  allocations and returns an error ray_t so the caller can abort rule eval
  cleanly.  The existing success path is unchanged.

* Aggregate parser + env auto-register: the parser hardcoded pred_arity=1
  when `prog` was NULL (surface syntax, globals), so env-bound source tables
  with any other arity were silently skipped during query-time auto-register.
  Use 0 as an "unknown at parse time" sentinel and let the auto-register
  resolve arity from the env-bound table's column count.  Compile already
  looks arity up via dl_find_rel, so no compile-side change is needed.

Regression test /datalog/env_bound_agg_auto_register exercises
(sum ?s salaries 1) over a 2-column env-bound table via the (rules …) clause;
the pre-fix code would reject the auto-register on arity mismatch.  699/699
pass on debug (ASan+UBSan) and release.
@singaraiona singaraiona requested a review from Copilot April 23, 2026 10:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/ops/datalog.c Outdated
Comment on lines +197 to +204
if (!fresh || RAY_IS_ERR(fresh)) return;
for (int c = 0; c < rel->arity; c++) {
ray_t* empty_col = ray_vec_new(desired[c], 0);
if (!empty_col || RAY_IS_ERR(empty_col)) { ray_release(fresh); return; }
ray_t* prev = fresh;
fresh = ray_table_add_col(fresh, rel->col_names[c], empty_col);
ray_release(empty_col);
if (RAY_IS_ERR(fresh)) { ray_release(prev); return; }
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dl_idb_align_head_const_types(): if ray_table_add_col() returns a RAY_ERROR, the code releases prev but leaks the returned error object (fresh). Release the error before returning to avoid accumulating errors on repeated calls (same issue for a RAY_ERROR return from ray_table_new).

Suggested change
if (!fresh || RAY_IS_ERR(fresh)) return;
for (int c = 0; c < rel->arity; c++) {
ray_t* empty_col = ray_vec_new(desired[c], 0);
if (!empty_col || RAY_IS_ERR(empty_col)) { ray_release(fresh); return; }
ray_t* prev = fresh;
fresh = ray_table_add_col(fresh, rel->col_names[c], empty_col);
ray_release(empty_col);
if (RAY_IS_ERR(fresh)) { ray_release(prev); return; }
if (!fresh) return;
if (RAY_IS_ERR(fresh)) { ray_release(fresh); return; }
for (int c = 0; c < rel->arity; c++) {
ray_t* empty_col = ray_vec_new(desired[c], 0);
if (!empty_col || RAY_IS_ERR(empty_col)) { ray_release(fresh); return; }
ray_t* prev = fresh;
fresh = ray_table_add_col(fresh, rel->col_names[c], empty_col);
ray_release(empty_col);
if (RAY_IS_ERR(fresh)) { ray_release(prev); ray_release(fresh); return; }

Copilot uses AI. Check for mistakes.
Comment thread src/core/runtime.c
ray_err_t* out_sym_err) {
return runtime_create_impl(sym_path, out_sym_err);
}

Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new public entrypoints ray_runtime_create_with_sym() / ray_runtime_create_with_sym_err() are defined here, but they aren’t declared in src/core/runtime.h or the public include/rayforce.h (tests work around this via manual externs). Please add the prototypes to the appropriate headers so consumers can use the APIs without duplicating declarations, and so signature changes can’t silently drift.

Suggested change
/* NOTE: These public entrypoints must also be declared in src/core/runtime.h
* and include/rayforce.h to keep the API surface synchronized.
*/

Copilot uses AI. Check for mistakes.
Comment thread src/ops/datalog.c Outdated
for (int c = 0; c < pred_arity; c++) {
ray_t* col = ray_table_get_col_idx(env_val, c);
ray_t* next_clean;
if (!col) continue;
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ray_query_fn env auto-registration: if ray_table_get_col_idx(env_val, c) returns NULL, the loop currently continues, which can build a clean table with fewer than pred_arity columns but still registers it via dl_add_edb(prog, pred_name, clean, pred_arity). That can leave the program in a schema-inconsistent state. Treat a missing column as a hard error (or at least abort registration for that predicate) so only fully-formed tables are registered.

Suggested change
if (!col) continue;
if (!col) {
ray_release(clean);
dl_program_free(prog);
ray_release(db);
return ray_error("invalid", "query: env-backed EDB table missing expected column");
}

Copilot uses AI. Check for mistakes.
Comment thread src/ops/datalog.c Outdated
/* Missing/error inputs: return an owned reference to the other side
* (retained so callers can release uniformly). */
if (!a || RAY_IS_ERR(a)) {
if (b && !RAY_IS_ERR(b)) ray_retain(b);
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In table_union(), when a is NULL or an error, the function returns b but only retains it if b is not an error. Callers typically release their original b after union; if b is a RAY_ERROR this can become a use-after-free (merged points to freed memory). Fix by retaining b whenever returning it (even if it is an error), or by documenting/adjusting ownership so callers don’t release the input they pass through unchanged.

Suggested change
if (b && !RAY_IS_ERR(b)) ray_retain(b);
if (b) ray_retain(b);

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/ops/datalog.c Outdated
* of src. If target==RAY_F64 and src is RAY_I64, promote. Returns new owned column. */
static ray_t* dl_col_as_f64(ray_t* src, int64_t nrows) {
ray_t* out = ray_vec_new(RAY_F64, nrows);
if (!out || RAY_IS_ERR(out)) return NULL;
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dl_col_as_f64() (and other helpers in this area) returns NULL when ray_vec_new() returns a RAY_ERROR. Because ray_release() is a no-op for RAY_ERROR, this leaks the error object. Prefer returning the RAY_ERROR to the caller (so dl_compile_rule/dl_eval can surface/free it) or calling ray_error_free(out) before returning NULL.

Suggested change
if (!out || RAY_IS_ERR(out)) return NULL;
if (!out) return NULL;
if (RAY_IS_ERR(out)) {
ray_error_free(out);
return NULL;
}

Copilot uses AI. Check for mistakes.
Comment thread src/ops/datalog.c
Comment on lines +1052 to +1072
ray_t* dst = (src->type == RAY_SYM)
? ray_sym_vec_new(src->attrs & RAY_SYM_W_MASK, nrows)
: ray_vec_new(src->type, nrows);
if (!dst) {
ray_release(out);
return ray_error("memory", "dl_project: vec_new");
}
if (RAY_IS_ERR(dst)) {
ray_error_free(dst);
ray_release(out);
return ray_error("memory", "dl_project: vec_new");
}
dst->len = nrows;
uint8_t esz = ray_sym_elem_size(src->type, src->attrs);
if (esz == 0) {
ray_release(dst);
ray_release(out);
return ray_error("type", "dl_project: unsupported column type");
}
memcpy(ray_data(dst), ray_data(src), (size_t)nrows * (size_t)esz);
ray_t* next = ray_table_add_col(out, head_rel->col_names[c], dst);
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dl_project() memcpy-copies RAY_STR columns into a freshly allocated vector but doesn't propagate the source column's str_pool. For pooled strings (>12 bytes), dst elements' pool_off will point into a NULL/empty pool, causing invalid reads later. After allocating dst for RAY_STR, propagate the pool (e.g., col_propagate_str_pool(dst, src)) or use a string-aware copy path like other ops do.

Copilot uses AI. Check for mistakes.
Comment thread src/core/runtime.c
Comment on lines +223 to +246
if (sym_path) {
/* Pre-flight size check: reject files that would blow past the
* memory budget before ever touching ray_col_load. */
struct stat st;
if (stat(sym_path, &st) == 0) {
/* Allow the sym file itself plus some working headroom (2x).
* A well-formed sym file is a list of interned strings; the
* in-memory footprint is bounded by file size within a small
* constant factor. */
if (st.st_size > 0 &&
(int64_t)st.st_size > rt->mem_budget / 2) {
if (out_sym_err) *out_sym_err = RAY_ERR_OOM;
/* Continue startup with empty sym table; caller decides
* whether to treat this as fatal. */
} else {
ray_err_t sym_err = ray_sym_load(sym_path);
if (out_sym_err) *out_sym_err = sym_err;
/* RAY_ERR_CORRUPT and I/O errors are non-fatal here:
* caller inspects out_sym_err to decide recovery. */
}
}
/* ENOENT and other stat failures: leave out_sym_err = RAY_OK;
* an absent sym file is the normal first-run case. */
}
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runtime_create_impl(): when stat(sym_path, …) fails, the code currently leaves out_sym_err = RAY_OK and skips ray_sym_load entirely. That hides real I/O errors (EACCES, ENOTDIR, etc.) and contradicts the header comment that the _err variant surfaces I/O failures. Consider: (1) treat errno==ENOENT as OK/first-run, but for other errno set *out_sym_err = RAY_ERR_IO; and/or (2) call ray_sym_load regardless and only use stat for the size pre-flight when it succeeds.

Copilot uses AI. Check for mistakes.
Comment thread src/core/runtime.h
Comment on lines +106 to +113
/* Persistent-consumer lifecycle: load the sym table from `sym_path` (if
* present) before builtins register, so user-interned IDs keep the same
* slots across process restarts. The _err variant surfaces the load
* result via `out_sym_err` (RAY_OK / RAY_ERR_CORRUPT / I/O errors) so
* callers can decide recovery policy; the plain variant discards it. */
ray_runtime_t* ray_runtime_create_with_sym(const char* sym_path);
ray_runtime_t* ray_runtime_create_with_sym_err(const char* sym_path,
ray_err_t* out_sym_err);
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New public APIs ray_runtime_create_with_sym / ray_runtime_create_with_sym_err change init ordering and error semantics, but there are still no tests exercising the key cases (load-before-builtins stability, corrupt file -> RAY_ERR_CORRUPT, oversized file -> RAY_ERR_OOM, and non-ENOENT I/O failure -> RAY_ERR_IO). Adding targeted runtime/sym integration tests would help prevent regressions in persistent-consumer scenarios.

Copilot uses AI. Check for mistakes.
Comment thread test/test_datalog.c
@singaraiona
Copy link
Copy Markdown
Contributor

@copilot resolve the merge conflicts in this pull request

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (1)

src/core/runtime.c:193

  • runtime_create_impl initializes global subsystems (ray_heap_init/ray_sym_init) and then has multiple allocation failure exits that return NULL without tearing those subsystems down or resetting globals. This can leave the process in a partially-initialized state and make a subsequent runtime_create call behave unpredictably. Add a single cleanup path that calls ray_sym_destroy/ray_heap_destroy and clears __VM/__RUNTIME on failure.
    /* Init subsystems */
    ray_heap_init();
    ray_sym_init();

    /* Allocate runtime and set __VM + mem_budget BEFORE any file I/O so
     * that ray_error() has a live VM to record diagnostics against and
     * allocations are bounded by the budget. */
    ray_runtime_t* rt = (ray_runtime_t*)ray_sys_alloc(sizeof(ray_runtime_t));
    if (!rt) return NULL;
    memset(rt, 0, sizeof(*rt));

    /* Create main VM (id=0) */
    rt->n_vms = 1;
    rt->vms = (ray_vm_t**)ray_sys_alloc(sizeof(ray_vm_t*));
    if (!rt->vms) { ray_sys_free(rt); return NULL; }
    rt->vms[0] = (ray_vm_t*)ray_sys_alloc(sizeof(ray_vm_t));
    if (!rt->vms[0]) { ray_sys_free(rt->vms); ray_sys_free(rt); return NULL; }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread test/test_runtime.c Outdated
ray_err_t err = RAY_OK;
ray_runtime_t* rt = ray_runtime_create_with_sym_err(path, &err);
munit_assert_ptr_not_null(rt);
munit_assert_int((int)err, !=, (int)RAY_OK);
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_create_with_sym_io_error_surfaces currently asserts only err != RAY_OK. Since runtime_create_impl explicitly maps non-ENOENT stat failures to RAY_ERR_IO, this test should assert the specific expected code (RAY_ERR_IO) so it will catch accidental changes in the error mapping.

Suggested change
munit_assert_int((int)err, !=, (int)RAY_OK);
munit_assert_int((int)err, ==, (int)RAY_ERR_IO);

Copilot uses AI. Check for mistakes.
Comment thread test/test_runtime.c
Comment on lines +39 to +57
/* Absent sym file: stat fails with ENOENT, which is the "first run"
* normal case. out_sym_err must stay RAY_OK and runtime must come up. */
static MunitResult test_create_with_sym_absent_is_ok(const void* params, void* fixture) {
(void)params; (void)fixture;
char* dir = make_tmpdir();
munit_assert_ptr_not_null(dir);
char path[256];
snprintf(path, sizeof(path), "%s/missing.sym", dir);

ray_err_t err = RAY_ERR_OOM; /* poison — should be overwritten */
ray_runtime_t* rt = ray_runtime_create_with_sym_err(path, &err);
munit_assert_ptr_not_null(rt);
munit_assert_int((int)err, ==, (int)RAY_OK);

ray_runtime_destroy(rt);
rmdir(dir);
free(dir);
return MUNIT_OK;
}
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runtime-with-sym tests cover ENOENT and a non-ENOENT stat failure, but they still don't exercise several key behaviors called out in the PR description / prior review feedback: (1) corrupt sym file surfaces RAY_ERR_CORRUPT via the _err variant, (2) oversized sym file triggers the pre-flight RAY_ERR_OOM path, and (3) load-before-builtins ordering (user IDs preserved, builtins appended). Adding targeted tests for these cases would prevent regressions in the persistence contract.

Copilot uses AI. Check for mistakes.
Comment thread src/ops/datalog.c
Comment on lines +1034 to +1048
if (empty_accum && head_rel && head_rel->table) {
ray_t* hcol = ray_table_get_col_idx(head_rel->table, c);
int8_t htype = hcol ? hcol->type : RAY_I64;
ray_t* ecol = ray_vec_new(htype, 0);
if (!ecol) {
ray_release(out);
return ray_error("memory", "dl_project: empty col");
}
if (RAY_IS_ERR(ecol)) {
ray_error_free(ecol);
ray_release(out);
return ray_error("memory", "dl_project: empty col");
}
ray_t* next = ray_table_add_col(out, head_rel->col_names[c], ecol);
ray_release(ecol);
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In dl_project's empty-accum fallback, when the head relation column type is RAY_SYM it creates an empty column via ray_vec_new(RAY_SYM, 0), which will default to W64 and can mismatch a head_rel SYM column that uses a narrower width. Since this path exists specifically to keep schemas compatible for downstream unions, it should preserve SYM attrs/width (e.g., by mirroring hcol->attrs via ray_sym_vec_new) rather than forcing W64.

Copilot uses AI. Check for mistakes.
Comment thread src/ops/datalog.c
Comment on lines +3074 to +3081
if (node->type == -RAY_STR) {
/* Quoted string literal in body: intern as sym so it compares
* equal to other sym-interned constants. Mirrors the head
* parser convention. */
int64_t sym = ray_sym_intern(ray_str_ptr(node), ray_str_len(node));
dl_body_set_const(rule, bidx, pos, sym);
return NULL;
}
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dl_set_body_pos now interns quoted string literals into SYM constants, which will cause the constant-filter path in dl_compile_rule (dl_filter_eq) to run against RAY_SYM columns. dl_filter_eq currently treats the filter column as int64_t* and compares raw 8-byte values, which is incorrect for adaptive-width SYM vectors (W8/W16/W32) and can miscompare/overread. The filter implementation should become type-aware (use ray_read_sym for RAY_SYM and element-size-aware copying for non-I64 types).

Suggested change
if (node->type == -RAY_STR) {
/* Quoted string literal in body: intern as sym so it compares
* equal to other sym-interned constants. Mirrors the head
* parser convention. */
int64_t sym = ray_sym_intern(ray_str_ptr(node), ray_str_len(node));
dl_body_set_const(rule, bidx, pos, sym);
return NULL;
}
/* Do not eagerly intern quoted string literals here.
*
* Interning body strings into SYM constants routes them into the
* constant-filter path, which must be type-aware for adaptive-width
* SYM vectors. Until that filter path is updated, fall through to the
* generic evaluator instead of manufacturing a SYM constant here.
*/

Copilot uses AI. Check for mistakes.
Comment thread src/ops/datalog.c Outdated
Comment on lines +1563 to +1564
if (!new_col || RAY_IS_ERR(new_col))
break;
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the scalar aggregate path, allocation failure for new_col is silently ignored (break), and if ray_vec_new returns a RAY_ERROR it is not freed. This can both leak the error object and allow evaluation to proceed with an unbound aggregate target var. Treat this as a hard failure: free any RAY_ERROR returned from ray_vec_new, set prog->eval_err, and return NULL so dl_eval returns -1.

Suggested change
if (!new_col || RAY_IS_ERR(new_col))
break;
if (!new_col) {
ray_release(accum);
prog->eval_err = true;
return NULL;
}
if (RAY_IS_ERR(new_col)) {
ray_release(new_col);
ray_release(accum);
prog->eval_err = true;
return NULL;
}

Copilot uses AI. Check for mistakes.
theaspirational and others added 26 commits April 23, 2026 21:39
…check

Three hardening changes to ray_runtime_create_with_sym:

1. Load order: set __VM and mem_budget BEFORE ray_sym_load so file I/O
   errors surface via ray_error() (previously silently dropped because
   ray_error() checks __VM && fmt and __VM was NULL) and allocations
   during load are bounded by mem_budget.

2. Error surfacing: new ray_runtime_create_with_sym_err(path, &out_err)
   returns the sym_load result so callers can distinguish "file absent"
   (RAY_OK), "corrupt/incompatible" (RAY_ERR_CORRUPT), and "I/O failure"
   (RAY_ERR_IO) and decide recovery policy.

3. Pre-flight size check: stat() the sym file before ray_col_load and
   reject if it exceeds mem_budget / 2, preventing a malicious or
   corrupted file from OOMing the process before the allocator's budget
   guards engage.

The original ray_runtime_create_with_sym wraps the _err variant with
NULL out-param for callers that don't need the diagnostic.

693/693 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…e index

dl_add_edb returns the newly-allocated relation index on success (0, 1, 2, …)
or -1 on failure. The auto-register branch in ray_query_fn compared the
return to != 0, so every env-bound EDB beyond the primary EAV (which always
occupies idx 0) was misclassified as a failure and the whole query aborted
with "query: failed to register env-backed EDB table".

The new test_env_bound_edb_auto_register test, which is the first coverage
for this feature, triggers the bug on every run (all four CI jobs were red
on it). Switching to < 0 matches the return contract; 693/693 tests pass
under both debug (ASan+UBSan) and release builds.
…lars

Three correctness fixes in src/ops/datalog.c flagged by the Copilot review on
PR RayforceDB#7, each with a matching regression test:

* dl_project (line ~1001): preserve SYM index width on the projected column.
  ray_vec_new(RAY_SYM, …) always returns a W64 vec, so memcpy'ing with the
  source's narrower element size left the upper bytes of each W64 slot
  uninitialized and produced bogus sym IDs. Route SYM columns through
  ray_sym_vec_new(src->attrs & RAY_SYM_W_MASK, …) so dst's stride matches src.

* Grouped aggregate compile (line ~1269): bounds-check each group_key_col
  and agg_value_col against src_rel->arity before indexing col_names[].
  An out-of-range key would OOB-read; an out-of-range value was silently
  clamped to column 0, producing correct-looking but wrong results. Also
  ensure ray_graph_free(gg) runs on every error return after gg is allocated
  (previously leaked on the out-of-range path).

* Scalar SUM/MIN/MAX/AVG (line ~1347): accept RAY_F64 value columns. The
  prior code treated any non-RAY_I64 as "return 0", so float-valued sources
  silently produced wrong results. New behaviour: i64 in → i64 out, f64 in
  → f64 out (AVG still always emits f64); non-numeric sources are rejected
  rather than zero'd.

Tests added:
* /datalog/agg_scalar_f64            — sum/avg over an f64 value column
* /datalog/agg_grouped_key_col_oor   — OOR group-key col rejected cleanly
* /datalog/project_narrow_sym        — rule passes a W8 SYM column through

make test (ASan+UBSan) and make release both green at 696/696.
Follow-up to the scalar-aggregate F64 support commit: `is_float` was only
flipped inside the `src_nrows > 0` branch, so an empty-source SUM over a
RAY_F64 column skipped the promotion and allocated a RAY_I64 result column
holding 0 instead of a RAY_F64 column holding 0.0.

Inspect the value column's type once up front, before the row-count split,
so the empty identity is emitted in the correct type. Also reject non-numeric
source columns at the same point so an empty, unsupported-type source can't
silently fall through to the i64 path either.

Regression test /datalog/agg_scalar_f64_sum_empty: SUM over an empty RAY_F64
column must produce a 1-row RAY_F64 result with value 0.0. 697/697 pass on
both debug (ASan+UBSan) and release.
…ount split

The grouped path bounds-checked agg_value_col, but the scalar path's check
lived inside the `src_nrows > 0` branch — so an empty source plus an
out-of-range value col silently emitted the SUM identity (0 / 0.0) against
an invalid slot instead of rejecting the rule.

Move the value-column bounds check (for SUM/MIN/MAX/AVG) above the empty-
source early return so it fires regardless of row count. COUNT still skips
this check since it doesn't read any value column.

Regression test /datalog/agg_scalar_value_col_oor_empty: empty i64 source
+ value_col=42 against an arity-1 relation must yield 0 rows, not 1 row
holding SUM=0. 698/698 pass on debug and release.
Two follow-up fixes from the latest Copilot review round on PR RayforceDB#7:

* dl_project (src/ops/datalog.c): allocation failures used to `continue` past
  the failed slot, leaking the RAY_ERROR object and producing a derived table
  with silently-missing columns.  Each error path now releases any in-flight
  allocations and returns an error ray_t so the caller can abort rule eval
  cleanly.  The existing success path is unchanged.

* Aggregate parser + env auto-register: the parser hardcoded pred_arity=1
  when `prog` was NULL (surface syntax, globals), so env-bound source tables
  with any other arity were silently skipped during query-time auto-register.
  Use 0 as an "unknown at parse time" sentinel and let the auto-register
  resolve arity from the env-bound table's column count.  Compile already
  looks arity up via dl_find_rel, so no compile-side change is needed.

Regression test /datalog/env_bound_agg_auto_register exercises
(sum ?s salaries 1) over a 2-column env-bound table via the (rules …) clause;
the pre-fix code would reject the auto-register on arity mismatch.  699/699
pass on debug (ASan+UBSan) and release.
…allowing them

dl_eval previously returned 0 on every call, so any rule that failed to
compile (dl_project OOM, aggregate bounds rejection, non-numeric value
column) or produced a ray_execute error was silently dropped from the
fixpoint — callers like ray_query_fn saw a 0 return and shipped an
incomplete IDB as a successful query result.

Introduce a program-scoped `eval_err` flag and flip it at every failure
site inside dl_compile_rule and dl_eval's runtime pipeline:

* dl_project returning a ray_error* from dl_compile_rule (OOM, type errors,
  add_col errors) — the error object is now released and the flag is set.
* Grouped-agg group_key_col / value_col out-of-range checks.
* Scalar-agg value_col out-of-range and non-numeric value column rejects.
* ray_execute / table_rename_cols / table_union / table_distinct errors
  inside both Phase A (initial eval) and Phase B (semi-naive iteration).

dl_eval now returns -1 iff the flag is set. NULL returns that represent
legitimate "rule produced no rows" (empty accum after body atoms, etc.)
still yield a 0 return — they're not errors.

Also taught dl_project to fall back to the IDB's existing column types
when accum collapses to 0 rows but has lost its schema (e.g. antijoin on
a fully-filtered relation).  This preserves schema agreement with the
IDB for downstream table_union without re-introducing the silent
missing-column bug.

Tests:
* /datalog/eval_surfaces_compile_failure (new): (query …) over a rule
  whose (sum ?s broken 99) is out of range must return a RAY_ERR, not an
  empty table.
* /datalog/agg_scalar_value_col_oor_empty, /datalog/agg_grouped_key_col_oor
  updated to expect dl_eval == -1 (previously asserted 0, which itself
  was the silent-failure bug).

700/700 pass on debug (ASan+UBSan) and release.
…ecute

Phase B of the semi-naive loop still had several `continue`-without-flag
silent drops that bypassed the new eval_err propagation:

* ray_graph_new() NULL in Phase A and Phase B (OOM) — now sets eval_err.
* table_union() accumulator in the new-tuples-per-head aggregation — NULL
  or RAY_ERR returns used to overwrite new_tuples_per_rel[head_idx] or
  silently continue; now mark eval_err and clear the slot cleanly.
* table_distinct / table_antijoin runtime errors in the per-IDB merge pass
  — now mark eval_err before continuing.
* table_union of delta back into rel->table — a failure here desyncs the
  fixpoint (delta_tables was already stored) yet the original code swallowed
  it; now mark eval_err on failure.

Phase A's `ray_graph_new` gained the same treatment for symmetry.
NULL returns from dl_compile_rule still pass through untouched — the
compiler itself sets eval_err on genuine failures; a bare NULL is
"no rows this iteration" and must not fault the whole program.

No new test is needed: /datalog/eval_surfaces_compile_failure already
exercises the full query -> dl_eval -> error-surfacing chain, and the
existing recursive tests (/lang/datalog/fixpoint, etc.) guard the Phase B
happy path.  700/700 pass on debug (ASan+UBSan) and release.
…rtial tables

table_union used to silently skip columns whose concat failed, producing a
table with missing columns that callers mistook for a successful union —
the caller then stored it as the IDB's new state or as merge output, so the
partial result became persistent.

Rewrite each helper to surface failure:

* table_union: RAY_IS_ERR on any missing column, concat failure, or
  ray_table_add_col failure — release partial `out` and return a typed
  ray_error.  Also retain `a`/`b` when we return one of them as-is so
  callers can release uniformly (previous code double-released when one
  input was NULL/error).

* table_distinct: canonicalize/graph_new failures now return ray_error
  instead of falling back to the raw input table (which may not be
  distinct at all — a silent wrong-result).

* table_antijoin: same treatment — canonicalize/graph_new failures return
  ray_error instead of treating `left` as the antijoin result.

dl_eval already checks RAY_IS_ERR on these helpers' returns and raises
eval_err, so the new error objects flow into the "evaluation failed"
surface exposed through ray_query_fn.  700/700 tests pass on debug
(ASan+UBSan) and release.
Even after the previous pass, table_union still silently narrowed to the
common prefix when the two inputs had different column counts: it computed
ncols = min(ncols_a, ncols_b) and produced a union table with fewer columns
than either side.  Callers (dl_eval merge paths) then stored that partial
table as the IDB's new state — a silent schema-corruption bug.

Reject the mismatch with a typed "schema" error instead of narrowing.  The
caller already picks this up via RAY_IS_ERR and sets prog->eval_err, so the
failure surfaces to ray_query_fn.  700/700 tests pass on debug (ASan+UBSan)
and release.
The schema-mismatch check only fired when both inputs had rows — the
empty-side short-circuits (`nrows(a) == 0` returns b, `nrows(b) == 0`
returns a) still bypassed it.  An antijoin that collapsed to (0 rows,
0 cols) then entered the union from the left side and silently emitted
`b` (wider schema) as the result; the caller stored it as the IDB's
state and the arity mismatch surfaced only much later.

Move the ncols_a != ncols_b check above the empty-rows bypass so every
input path exits with either a schema-matching result or a ray_error.
dl_eval already promotes the error to prog->eval_err.  700/700 tests
pass on debug (ASan+UBSan) and release.
Four comments from the 2026-04-23 Copilot review round:

* `dl_idb_align_head_const_types` (src/ops/datalog.c:~204): split every
  `if (!x || RAY_IS_ERR(x))` error branch so the RAY_ERROR object is
  released before return.  Repeated aligns were leaking an error block per
  failed call.  The `ray_table_add_col` failure path also now releases
  both `prev` and the returned error, instead of just `prev`.

* `ray_runtime_create_with_sym` / `_with_sym_err` (src/core/runtime.h):
  the two persistent-consumer entrypoints were defined in runtime.c but
  never declared — tests duplicated the prototype by hand.  Publish them
  in src/core/runtime.h (alongside the existing `ray_runtime_create` /
  `ray_runtime_destroy`) so consumers don't drift.  Kept a minimal
  forward-decl in test_datalog.c because including core/runtime.h from a
  TU that also includes lang/eval.h collides on `ray_vm_t` (pre-existing
  duplication out of scope for this PR) — the comment flags that.

* env auto-register (src/ops/datalog.c:~3620): a NULL from
  `ray_table_get_col_idx(env_val, c)` used to `continue`, producing a
  `clean` table with fewer than pred_arity columns that `dl_add_edb`
  still registered as the EDB.  Replaced the skip with a hard
  ray_error("schema", "env-backed EDB table missing expected column")
  so only fully-formed tables ever register.

* `table_union` pass-through (src/ops/datalog.c:~1819): the `a` is
  NULL/error branch previously only retained `b` when it was non-error.
  If callers passed a RAY_ERROR `b` and later released it, the returned
  pointer aliased freed memory.  Retain `b` whenever it's non-NULL —
  even on error — so the pass-through contract is uniform.

Also adopted the setup/teardown cleanup from the review: datalog_rf_setup
now returns the created runtime as the fixture, teardown receives it
back, and failure to create aborts loudly instead of silently skipping
tests.

700/700 pass on debug (ASan+UBSan) and release.
The prior round's error-lifetime fixes were no-ops.  ray_release(),
ray_retain(), and even ray_free() each short-circuit on RAY_IS_ERR, so
every `if (RAY_IS_ERR(x)) ray_release(x);` site quietly leaked the error
block — repeated failures (e.g. an aggregate evaluated in a tight REPL
loop) would slowly bleed memory until heap teardown.

Introduce ray_error_free(ray_t*) as the escape hatch:

  void ray_error_free(ray_t* err) {
      if (!err || !RAY_IS_ERR(err)) return;
      err->type = -RAY_I64;  /* retype to a leaf atom so ray_free and
                                ray_release_owned_refs don't skip it */
      ray_free(err);
  }

Retyping to an atom with no owned children is the safest way to route
an error block through the standard free path without requiring a new
allocator primitive.  The helper is published in include/rayforce.h
alongside ray_error so consumers can use it directly.

Wire the helper through every site where a RAY_ERROR object was being
"released" but in fact leaking:

* dl_idb_align_head_const_types: ray_table_new / ray_vec_new /
  ray_table_add_col error returns.
* dl_project: ecol, dst, add_col error returns (split the combined
  `!x || RAY_IS_ERR(x)` branches so the NULL vs ERROR cases can take
  different cleanup paths).
* dl_compile_rule dl_project propagation.
* dl_eval Phase A merge pipeline: raw_tuples, new_tuples, merged,
  deduped.
* dl_eval Phase B semi-naive: raw_result, result, u (union accum),
  new_tuples, deduped, delta, merged.
* ray_query_fn env auto-register: i64col, next_clean.

Regression test /datalog/error_free_reclaims loops 256 create/free
cycles and asserts via ray_mem_stats() that free_count advances
alongside alloc_count and bytes_allocated stays flat.  Pre-fix, this
test flagged the leak (free_count stayed at 0 for the 256 errors).
701/701 pass on debug (ASan+UBSan) and release.
Four comments from the 2026-04-23T11:13 Copilot pass:

* dl_col_as_f64 / dl_eval_expr helpers (src/ops/datalog.c): five more
  ray_vec_new sites used `if (!out || RAY_IS_ERR(out)) return NULL`
  which leaks the RAY_ERROR block (ray_release is a no-op on errors).
  Split each check and route the RAY_IS_ERR branch through
  ray_error_free() so the block is actually reclaimed.

* dl_project RAY_STR pool propagation (src/ops/datalog.c:~1072): the
  memcpy path copied 16-byte ray_str_t handles but left dst->str_pool
  NULL, so reads of strings >12 bytes would dereference pool_off into
  a NULL pool.  Call col_propagate_str_pool(dst, src) after the memcpy
  when src is RAY_STR (matches the pattern used by ops/filter and
  ops/pivot).

* runtime_create_impl stat() errno handling (src/core/runtime.c:~246):
  previously any stat() failure silently stayed RAY_OK.  Now ENOENT
  remains the "first-run" OK case, but every other errno (EACCES,
  ENOTDIR, EIO, …) sets *out_sym_err = RAY_ERR_IO so persistent
  consumers can see the underlying I/O failure instead of coming up
  with a silent empty sym table.

* test/test_runtime.c (new): targeted tests for the persistent-
  consumer surface.  /runtime/create_with_sym_absent_is_ok verifies
  ENOENT stays RAY_OK.  /runtime/create_with_sym_io_error_surfaces
  constructs an ENOTDIR path (creates a regular file then asks for
  a subpath under it) and asserts out_sym_err becomes non-RAY_OK.
  /runtime/create_with_sym_plain_variant_absent exercises the
  non-_err variant with a missing sym file.  test_main.c wires the
  new suite into the root.

704/704 pass on debug (ASan+UBSan) and release.
Five comments from 2026-04-23T12:12:

* dl_filter_eq SYM correctness (src/ops/datalog.c:~934): the constant-
  filter path read key columns as int64_t*, which miscompares and
  overreads against adaptive-width RAY_SYM vectors (W8/W16/W32).  New
  dl_col_eq_row helper dispatches on type and uses ray_read_sym for
  RAY_SYM; the output-column copy is element-size-aware (memcpy of esz
  bytes per surviving row, allocator picks sym_vec_new for SYM so the
  narrow width is preserved).  Non-numeric/non-sym key columns fall
  through unfiltered — matches the existing pass-through refcount
  convention.  Also propagates str_pool when the filtered column is
  RAY_STR.

* Scalar aggregate allocation failure (src/ops/datalog.c:~1594): a NULL
  or RAY_ERROR from ray_vec_new previously did `break`, silently leaving
  agg_target_var unbound and letting evaluation continue with a
  half-constructed rule (and leaking the error block).  Now release
  accum, free the error via ray_error_free, set prog->eval_err, and
  return NULL so dl_eval reports failure.

* dl_project empty-accum SYM width (src/ops/datalog.c:~1037): the
  empty-accum fallback built ecol via ray_vec_new(RAY_SYM, 0) which
  always lands at W64 — mismatching a head relation that uses narrower
  SYM attrs.  Mirror hcol->attrs width via ray_sym_vec_new(..., 0) when
  the head-rel column is RAY_SYM so downstream table_union sees a
  matching schema instead of tripping the column-count guard.

* /runtime/create_with_sym_io_error_surfaces now pins the exact code
  (RAY_ERR_IO) — drifting the errno->code mapping will fail loudly
  instead of sliding by on a "not RAY_OK" check.

* Two new targeted runtime tests:
  /runtime/create_with_sym_corrupt_file writes garbage bytes to the
  sym path and asserts out_sym_err goes non-RAY_OK (ray_sym_load
  flags RAY_ERR_CORRUPT on header validation).
  /runtime/create_with_sym_load_preserves_user_ids exercises the
  whole persistence promise end-to-end: intern "rayforce-user-marker",
  ray_sym_save, destroy, reload via ray_runtime_create_with_sym_err,
  re-intern the same string and assert the ID is stable.  (Oversized-
  file / RAY_ERR_OOM isn't tested because mem_budget is a large
  fraction of RAM and can't be overridden from a test without a
  dependency-injection hook — noted in the commit but not blocking.)

706/706 pass on debug (ASan+UBSan) and release.
dl_filter_eq's pass-through branches (NULL input, RAY_IS_ERR input,
empty-rows input, missing key column, non-I64/non-SYM key column) used
to return the input pointer without bumping its refcount.  Combined
with the caller idiom

    ray_retain(body_tbl);
    ray_t* filtered = dl_filter_eq(body_tbl, c, const);
    ray_release(body_tbl);
    body_tbl = filtered;

this meant the pass-through paths produced a `filtered` whose refcount
was *one less* than the caller assumed.  Later `ray_release(body_tbl)`
calls (in dl_compile_rule's join paths and the DL_NEG antijoin) could
then drop the underlying rel->table to rc=0 and free it out from under
the program's own EDB storage.

Tighten the contract so every exit from dl_filter_eq returns an owned
reference:

* empty-rows short-circuit: retain before return.
* missing-column short-circuit: retain before return.
* non-I64/non-SYM key pass-through: retain before return.
* RAY_IS_ERR input: retain iff non-NULL (caller will see RAY_IS_ERR
  and free via ray_error_free).

The count==nrows branch already retained.  Callers are unchanged; the
fix simply matches their existing expectations.  706/706 pass on debug
(ASan+UBSan) and release.
…warnings

Two datalog paths were writing diagnostic messages to stderr from
non-debug builds instead of surfacing the failure through the normal
error channel:

* dl_idb_align_head_const_types (src/ops/datalog.c): head-const type
  conflict between rules used to `fprintf(stderr, ...)` and return
  without flagging the program.  Removed the stderr write and set
  prog->eval_err = true instead.  dl_eval now short-circuits when
  eval_err is already set at entry (sticky flag), so the conflict
  detected at rule-add time reaches dl_eval's return without being
  reset.  Callers like ray_query_fn turn that -1 into a proper
  "query: evaluation failed" ray_error.

* dl_compile_rule grouped-aggregate + positive body atoms (nyi path):
  same treatment — remove stderr warning, set prog->eval_err.

Regression test /datalog/rule_head_const_type_conflict builds a
program with two rules whose head-const types collide on the same
slot and asserts dl_eval returns -1.  707/707 pass on debug
(ASan+UBSan) and release.
Seven distinct comments across the 13:35 / 13:54 review rounds:

* Aggregate-only rule fallback (datalog.c:~1276): one_val / accum /
  add_col allocation failures previously returned NULL without
  setting eval_err or freeing RAY_ERROR blocks.  All paths now set
  prog->eval_err = true and route errors through ray_error_free().

* Grouped-aggregate compile (datalog.c:~1414..1468): added NULL
  checks around ray_graph_new, ray_sym_str, every ray_scan, ray_group,
  and ray_execute.  On any failure, free the sub-graph, release
  accum, set eval_err, and return NULL.  Execute errors that were
  RAY_ERROR are now freed via ray_error_free instead of leaking.

* DL_CMP type awareness (datalog.c:~1746): comparison loop used to
  read both sides as int64_t*, miscomparing RAY_F64 columns produced
  by dl_eval_expr's float path.  Now promotes to f64 iff either side
  is f64 and rejects non-numeric sources with eval_err.  Filter-copy
  loop switched to element-size-aware memcpy so f64/narrow-SYM
  columns survive the mask unchanged, with str_pool propagation for
  RAY_STR.

* table_union concat (datalog.c:~2055): propagate the original
  ray_vec_concat error (e.g. "type" for schema mismatch) instead of
  always rewriting it to a generic "memory" error.

* dl_filter_eq (datalog.c:~979): added proper error checks for
  ray_table_new, ray_vec_new, and ray_table_add_col.  Previously a
  failure in any of those silently `continue`-d past the column,
  yielding a partial table.

* Corrupt-file test (test_runtime.c): the fake-garbage payload was
  tripping ray_col_load's header parser (RAY_ERR_NYI) before the
  sym-table-specific corrupt path.  Now writes STR_LIST_MAGIC +
  inflated string count so col_load_str_list hits its truncated-
  body check, which returns ray_error("corrupt", ...) — mapping
  back to the exact RAY_ERR_CORRUPT we assert.

* Oversized-file OOM test (test_runtime.c): new
  /runtime/create_with_sym_oversized_file uses ftruncate to produce
  a 10 EB sparse sym file and asserts out_sym_err == RAY_ERR_OOM.
  Filesystems that reject the giant ftruncate (tmpfs) return
  MUNIT_SKIP rather than a spurious failure.

* test_datalog.c: added <stdio.h>/<stdlib.h> so fprintf/abort() in
  datalog_rf_setup have explicit declarations under -Werror (they
  were arriving transitively before but the coverage was fragile).

708/708 pass on debug (ASan+UBSan) and release.
Round-6 made dl_filter_eq surface allocation/add_col failures as
RAY_ERROR returns, but the two call sites in dl_compile_rule were
still blindly assigning the return to body_tbl / neg_tbl and
continuing.  If the filter failed, the subsequent join path would
then operate on a RAY_ERROR, or hit a NULL — either way turning a
proper failure into undefined behavior.

Both call sites (positive DL_POS body filter at ~1199 and DL_NEG
filter at ~1353) now treat a NULL or RAY_ERROR return as a hard
failure: release the surviving accum, free any error block via
ray_error_free, set prog->eval_err = true, and return NULL from
dl_compile_rule so dl_eval reports -1.

708/708 pass on debug (ASan+UBSan) and release.
…l_err

Four comments from 2026-04-23T14:21:

* dl_filter_eq / table_union / dl_project: ray_table_add_col failures
  used to return without releasing the partially-built output table.
  ray_table_add_col doesn't free its input on error, so this leaked
  the table plus every previously-added column.  All four add_col
  sites (dl_filter_eq, table_union, dl_project — var path and both
  const/empty paths) now release `out` before returning the error
  (or return the RAY_ERROR directly when that's the most informative
  code).

* dl_project constant-SYM broadcast width: dl_broadcast_const_col
  used ray_vec_new(RAY_SYM, …) which always produced W64.  When a
  prior rule had already aligned the head relation's SYM column to
  a narrower width, the W64 broadcast would then fail ray_vec_concat
  inside table_union.  Added a sym_width_hint parameter routed
  through ray_sym_vec_new, and in dl_project the caller now inspects
  the existing head_rel column's attrs and passes the matching width.

* dl_idb_align_head_const_types: OOM / add_col failures while
  rebuilding the IDB's empty schema used to silently `return`,
  leaving the relation with its original column types and eventually
  causing a ray_vec_concat mismatch during evaluation.  Every failure
  path now sets prog->eval_err = true (and still frees any RAY_ERROR
  via ray_error_free), so dl_eval reliably returns -1 when alignment
  can't complete.

708/708 pass on debug (ASan+UBSan) and release.
Round-8's add_col-leak fix covered the var-slot path (src_idx >= 0) and
the empty-accum fallback but missed the third add_col site — the
constant-head-slot path right below them.  If ray_table_add_col fails
there (NULL or RAY_ERR) the partially-built `out` was still being
returned-without-release, leaking the table and every column already
added.  Applied the same release-before-return treatment.

708/708 pass on debug (ASan+UBSan) and release.
Six comments from 2026-04-23T15:11:

* Aggregate-only rule fallback (datalog.c:~1374): ray_table_add_col
  failure left `accum` leaked.  Release it on both error exits.

* DL_CMP dl_eval_expr failure (datalog.c:~1855): previously a NULL or
  RAY_ERROR from dl_eval_expr did `break`, silently skipping the
  comparison filter and letting the rule produce wrong rows.  Treat
  eval failure as unrecoverable: free any RAY_ERROR, release accum,
  set prog->eval_err, and return NULL from dl_compile_rule.  Same for
  the missing-variable-column paths.

* DL_CMP filtered-table build (datalog.c:~1982): `continue`-ing past
  a failed column allocation / add_col produced a table with fewer
  columns than accum — silent schema corruption.  Now validates
  ray_table_new, ray_vec_new, ray_sym_vec_new, and ray_table_add_col
  at every step; any failure releases the partial output and raises
  prog->eval_err.

* Public API backward compat (datalog.h:252): dl_rule_head_const's
  signature was extended from (rule, pos, val) to (rule, pos, val,
  type) in this PR, breaking external callers.  Restored the 3-arg
  form as an I64 wrapper and added dl_rule_head_const_typed for the
  typed variant.  Internal call sites and tests updated to the new
  name.

* Plan doc absolute paths (docs/plans/...md): replaced three
  /Users/aspirational/... paths with $WORKSPACE-relative references
  so the doc is portable across dev environments / CI.

* Oversized-file OOM test (test_runtime.c:~208): 32-bit off_t shift
  was undefined.  Skip when sizeof(off_t) < 8 and build the size via
  int64_t before casting to off_t.

708/708 pass on debug (ASan+UBSan) and release.
Round-9 covered DL_CMP's silent-break paths but missed DL_ASSIGN, which
had the same pattern:

    if (!new_col || RAY_IS_ERR(new_col)) break;

A failed assignment silently leaves assign_var unbound; subsequent body
literals keep compiling with stale bindings, and dl_eval returns 0 with
a wrong result set.  Now treat every failure mode as unrecoverable: free
any RAY_ERROR via ray_error_free, release accum, set prog->eval_err, and
return NULL.  Also extended the check to the dl_table_add_computed_col
return — previously unchecked.

708/708 pass on debug (ASan+UBSan) and release.
Three comments from 2026-04-23T15:50:

* table_union propagates error operands (datalog.c:~2203): when `b`
  was a RAY_ERROR the function used to return `a` instead, silently
  masking the real failure.  Now both error operands are returned
  (retained) so the caller sees the true diagnostic, and NULL operands
  still fall back to the other side.  ray_retain is a no-op on errors,
  so the "owned return" contract is preserved.

* dl_broadcast_const_col width sentinel (datalog.c:~1054): the old
  uint8_t sym_width_hint used 0 as "default", but RAY_SYM_W8 is also
  0 — so an aligned IDB with a W8 head-const column would silently
  widen to W64, later tripping ray_vec_concat.  Replaced the hint
  with a `const ray_t* width_template` pointer: NULL means "default
  W64", otherwise copy the template column's adaptive width directly.
  Caller (dl_project) updated to pass the head_rel column pointer.

* test_runtime.c: replaced the manual runtime-API forward decls with
  `#include "core/runtime.h"` now that this TU doesn't pull in
  lang/eval.h — the ray_vm_t duplication concern from test_datalog.c
  doesn't apply here, so let the real header be the source of truth.

708/708 pass on debug (ASan+UBSan) and release.
@singaraiona
Copy link
Copy Markdown
Contributor

@copilot resolve the merge conflicts in this pull request

@singaraiona singaraiona force-pushed the feature/datalog-aggregates branch from 425ef1a to 23a0cb5 Compare April 24, 2026 09:10
@singaraiona singaraiona marked this pull request as ready for review April 24, 2026 09:12
@singaraiona singaraiona merged commit dda2b98 into RayforceDB:master Apr 24, 2026
4 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants