add sample/issue922_benchmark.rb to compare table adapter approaches vs parquet by suketa · Pull Request #1121 · suketa/ruby-duckdb

suketa · 2026-02-28T08:42:00Z

Summary

Adds sample/issue922_benchmark.rb comparing three table function approaches against the parquet approach for exposing a Polars::DataFrame to DuckDB.

Approaches compared

Approach	Description
1 row/call	Baseline — one row per execute callback
batch/call	2048 rows per execute callback, reduces Ruby↔C crossings
batch + pre-extract	Batch + pre-extract column arrays via `to_a` to eliminate repeated Polars FFI calls
parquet	Write to temp file, read with `read_parquet()`

Typical results

1 row/call:            ~0.29s
batch/call:            ~0.23s  (~20% faster)
batch + pre-extract:   ~0.07s  (~76% faster than baseline)
parquet:               ~0.05s

Pre-extracting column arrays is the biggest Ruby-side win, bringing the table adapter to within ~1.5x of parquet.

Summary by CodeRabbit

New Features
- Added benchmark utilities for evaluating Polars DataFrame integration with DuckDB across multiple execution approaches.
- Introduced Parquet-based fallback pathway for comprehensive performance comparison testing.

…approach Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…vs parquet Three table function approaches are compared: - 1 row per execute call (baseline) - batch (2048 rows per call) to reduce Ruby<->C crossings - batch + pre-extracted column arrays to avoid repeated Polars FFI calls Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

coderabbitai · 2026-02-28T08:42:18Z

📝 Walkthrough

Walkthrough

A new benchmark file is introduced that implements three Polars DataFrame table-function adapters for DuckDB with different streaming strategies (single-row, batch-per-call, and optimized batch), along with a Parquet-based fallback approach and benchmark harness to compare performance.

Changes

Cohort / File(s)	Summary
Polars DataFrame Table Adapters Benchmark `sample/issue922_benchmark.rb`	Adds three table-adapter classes (`PolarsDataFrameTableAdapter`, `PolarsDataFrameBatchTableAdapter`, `PolarsDataFrameOptimizedTableAdapter`) implementing different streaming patterns to expose Polars DataFrames as DuckDB table functions; includes `query_via_parquet` helper for Parquet-based fallback; benchmark setup creates test data, registers adapters, measures performance, and cleans up temporary files.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

refactor: rewrite sample/issue930*.rb to use table adapter pattern #1118: Implements CSV table-adapter with identical call(data, name, columns: nil) entrypoint and DuckDB::TableFunction registration pattern.
add table adapter functionality #1115: Introduces the underlying table-adapter registration and expose-as-table APIs that these adapters depend on.
refactor: rewrite sample/issue922.rb to use table adapter pattern #1119: Previously modified the same sample file with a basic PolarsDataFrameTableAdapter; this PR extends it with batch and optimized variants plus benchmark workflow.

Poem

🐰 Three rabbits hop through Polars fields so green,
They bundle rows in batches—swift, serene!
From DataFrame to DuckDB they race,
One row, then batches, optimized pace—
A benchmark feast, no Parquet delay! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding a benchmark file that compares table adapter approaches against parquet.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch sample/issue922-benchmark

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@sample/issue922_benchmark.rb`:
- Around line 120-123: The query_via_parquet method currently interpolates name
and parquet_path directly into SQL (con.query("CREATE OR REPLACE TABLE #{name}
..." and read_parquet('#{parquet_path}')), which is unsafe and will break on
quotes; change it to use your DB driver's safe APIs: bind the parquet_path as a
parameter or use a quoted-literal helper for the read_parquet argument, and
quote or escape the table identifier using the connection's identifier-quoting
helper (or a prepared/parameterized CREATE/INSERT pattern) instead of string
interpolation; update the two con.query calls in query_via_parquet to construct
SQL with the connection's quote_identifier/quote_literal or parameter binding to
ensure safe identifiers and literals.
- Around line 134-163: Wrap the benchmark resource lifecycle in an ensure block
so the connection and database are always closed and the parquet file always
removed even on errors: acquire db and con via DuckDB::Database.open and
con.connect, run the benchmark calls (con.expose_as_table, queries, and
query_via_parquet) in the begin section, and move con.close, db.close and
parquet cleanup into ensure. Replace the fixed parquet_path with a unique
temporary file (use Tempfile or Dir::Tmpname to generate a temp pathname) and
ensure that tempfile is closed/unlinked in the ensure block after
query_via_parquet completes.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9209357 and f2e27f1.

📒 Files selected for processing (1)

sample/issue922_benchmark.rb

coderabbitai · 2026-02-28T08:46:05Z

sample/issue922_benchmark.rb

+def query_via_parquet(con, data_frame, name, parquet_path)
+  data_frame.write_parquet(parquet_path)
+  con.query("CREATE OR REPLACE TABLE #{name} AS SELECT * FROM read_parquet('#{parquet_path}')")
+  con.query("SELECT * FROM #{name}").to_a


⚠️ Potential issue | 🟠 Major

Harden SQL construction for identifier/literal safety.

name and parquet_path are interpolated directly into SQL. This can break on quotes and is unsafe if reused with non-constant input.

🔧 Proposed fix

def query_via_parquet(con, data_frame, name, parquet_path) + quoted_name = %("#{name.to_s.gsub('"', '""')}") + quoted_path = "'#{parquet_path.to_s.gsub("'", "''")}'" data_frame.write_parquet(parquet_path) - con.query("CREATE OR REPLACE TABLE #{name} AS SELECT * FROM read_parquet('#{parquet_path}')") - con.query("SELECT * FROM #{name}").to_a + con.query("CREATE OR REPLACE TABLE #{quoted_name} AS SELECT * FROM read_parquet(#{quoted_path})") + con.query("SELECT * FROM #{quoted_name}").to_a end

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@sample/issue922_benchmark.rb` around lines 120 - 123, The query_via_parquet method currently interpolates name and parquet_path directly into SQL (con.query("CREATE OR REPLACE TABLE #{name} ..." and read_parquet('#{parquet_path}')), which is unsafe and will break on quotes; change it to use your DB driver's safe APIs: bind the parquet_path as a parameter or use a quoted-literal helper for the read_parquet argument, and quote or escape the table identifier using the connection's identifier-quoting helper (or a prepared/parameterized CREATE/INSERT pattern) instead of string interpolation; update the two con.query calls in query_via_parquet to construct SQL with the connection's quote_identifier/quote_literal or parameter binding to ensure safe identifiers and literals.

coderabbitai · 2026-02-28T08:46:05Z

sample/issue922_benchmark.rb

+db = DuckDB::Database.open
+con = db.connect
+con.query('SET threads=1')
+
+DuckDB::TableFunction.add_table_adapter(Polars::DataFrame, PolarsDataFrameTableAdapter.new)
+start_time = Process.clock_gettime(Process::CLOCK_MONOTONIC)
+con.expose_as_table(df, 'polars_tf')
+con.query('SELECT * FROM polars_tf()').to_a
+end_time = Process.clock_gettime(Process::CLOCK_MONOTONIC)
+
+DuckDB::TableFunction.add_table_adapter(Polars::DataFrame, PolarsDataFrameBatchTableAdapter.new)
+start_time3 = Process.clock_gettime(Process::CLOCK_MONOTONIC)
+con.expose_as_table(df, 'polars_tf_batch')
+con.query('SELECT * FROM polars_tf_batch()').to_a
+end_time3 = Process.clock_gettime(Process::CLOCK_MONOTONIC)
+
+DuckDB::TableFunction.add_table_adapter(Polars::DataFrame, PolarsDataFrameOptimizedTableAdapter.new)
+start_time4 = Process.clock_gettime(Process::CLOCK_MONOTONIC)
+con.expose_as_table(df, 'polars_tf_opt')
+con.query('SELECT * FROM polars_tf_opt()').to_a
+end_time4 = Process.clock_gettime(Process::CLOCK_MONOTONIC)
+
+parquet_path = File.join(Dir.tmpdir, 'issue922_benchmark.parquet')
+start_time2 = Process.clock_gettime(Process::CLOCK_MONOTONIC)
+query_via_parquet(con, df, 'polars_pq', parquet_path)
+end_time2 = Process.clock_gettime(Process::CLOCK_MONOTONIC)
+
+con.close
+db.close
+File.delete(parquet_path)


⚠️ Potential issue | 🟠 Major

Wrap benchmark lifecycle in ensure and avoid fixed tmp filename.

Connection/database close and parquet cleanup should run even if a query fails. Also, a fixed tmp filename can collide across concurrent runs.

🔧 Proposed fix

-parquet_path = File.join(Dir.tmpdir, 'issue922_benchmark.parquet') -start_time2 = Process.clock_gettime(Process::CLOCK_MONOTONIC) -query_via_parquet(con, df, 'polars_pq', parquet_path) -end_time2 = Process.clock_gettime(Process::CLOCK_MONOTONIC) - -con.close -db.close -File.delete(parquet_path) +parquet_path = File.join(Dir.tmpdir, "issue922_benchmark_#{Process.pid}_#{Time.now.to_i}.parquet") +begin + start_time2 = Process.clock_gettime(Process::CLOCK_MONOTONIC) + query_via_parquet(con, df, 'polars_pq', parquet_path) + end_time2 = Process.clock_gettime(Process::CLOCK_MONOTONIC) +ensure + con&.close + db&.close + File.delete(parquet_path) if File.exist?(parquet_path) +end

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@sample/issue922_benchmark.rb` around lines 134 - 163, Wrap the benchmark resource lifecycle in an ensure block so the connection and database are always closed and the parquet file always removed even on errors: acquire db and con via DuckDB::Database.open and con.connect, run the benchmark calls (con.expose_as_table, queries, and query_via_parquet) in the begin section, and move con.close, db.close and parquet cleanup into ensure. Replace the fixed parquet_path with a unique temporary file (use Tempfile or Dir::Tmpname to generate a temp pathname) and ensure that tempfile is closed/unlinked in the ensure block after query_via_parquet completes.

suketa and others added 2 commits February 28, 2026 16:46

add sample/issue922_benchmark.rb to compare table adapter vs parquet …

08d1892

…approach Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

coderabbitai bot reviewed Feb 28, 2026

View reviewed changes

suketa merged commit 6a9c9e6 into main Feb 28, 2026
41 checks passed

suketa deleted the sample/issue922-benchmark branch February 28, 2026 10:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add sample/issue922_benchmark.rb to compare table adapter approaches vs parquet#1121

add sample/issue922_benchmark.rb to compare table adapter approaches vs parquet#1121
suketa merged 2 commits intomainfrom
sample/issue922-benchmark

suketa commented Feb 28, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 28, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 28, 2026

Uh oh!

coderabbitai bot Feb 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

suketa commented Feb 28, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Approaches compared

Typical results

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

suketa commented Feb 28, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 28, 2026 •

edited

Loading