Add PreparedInsert flow #443

theory · 2025-10-30T22:02:46Z

Add a new pattern for "prepared inserts". It works like this:

Call PrepareInsert with an INSERT query with optional columns and ending in VALUES. No values should be included in the string.
It returns a PreparedInsert object that has two methods:
- Block() returns a Block pre-configured with columns as declared in the INSERT statement
- Execute() inserts data from the block then clears it.
When the PreparedInsert object goes out of scope it first signals the server that it's done sending data.

This allows one to send smaller batches of blocks, thereby using less memory, but still in a single ClickHouse INSERT operation.

Expected to be useful in the Postgres foreign data wrapper insert API, where multiple rows can be inserted at once but its API handles one-at-a-time insertion. It will also support the FDW COPY API, which can submit huge batches of data to insert, as well.

theory · 2025-10-30T22:05:42Z

clickhouse/client.cpp

+            if (chtype->GetCode() == Type::LowCardinality) {
+                chtype = col->As<ColumnLowCardinality>()->GetNestedType();
+            }


I'm honestly not sure this is the right thing to do. Might one need Type::LowCardonality?

theory · 2025-10-30T22:06:23Z

clickhouse/client.cpp

+
+    void FinishInsert();
+
+    void SendData(const Block& block);


I had to move this to public so that PreparedInsert can call it. Not in the header file, though, so shouldn't matter.

clickhouse/client.cpp

theory · 2025-10-30T22:08:12Z

clickhouse/client.h

+    public:
+        Block * GetBlock();
+        void Execute();
+        // XXX This shouldn't be public.


I couldn't figure out how to make this private. Suggestions appreciated.

Would be nice if it worked declared public in the .cpp file, but I think I could also use an Impl class like Client does to hide such things.

Copilot

Pull Request Overview

This PR introduces a PreparedInsert pattern for more memory-efficient bulk data insertion. Instead of accumulating all data before sending, users can now prepare an INSERT statement once and execute multiple smaller batches within a single ClickHouse operation.

Key Changes:

Added PreparedInsert class with GetBlock(), Execute(), and Finish() methods for iterative data insertion
Implemented PrepareInsert() methods in Client for initiating prepared inserts
Added comprehensive unit test demonstrating the prepared insert workflow

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File	Description
clickhouse/client.h	Declared `PreparedInsert` nested class and `PrepareInsert()` methods with detailed documentation
clickhouse/client.cpp	Implemented `PreparedInsert` class methods, `ReceivePreparePackets()`, and refactored insert finalization logic
clickhouse/block.h	Fixed spelling in comments ("Convinience" → "Convenience")
ut/client_ut.cpp	Added `PrepareInsert` test case and fixed spelling in existing comment ("Spontaneosly" → "Spontaneously")

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ut/client_ut.cpp

clickhouse/client.h

clickhouse/client.cpp

ut/client_ut.cpp

clickhouse/client.cpp

Add a new pattern for "prepared inserts". It works like this: * Call `PrepareInsert` with an `INSERT` query with optional columns and ending in `VALUES`. No values should be included in the string. * It returns a `PreparedInsert` object that has two methods: * `Block()` returns a `Block` pre-configured with columns as declared in the `INSERT` statement * `Execute()` inserts data from the block then clears it. * Call `Finish()` or just let the `PreparedInsert` object go out of scope to send any remaining rows and to signal the server that it's done. This allows one to send smaller batches of blocks, thereby using less memory, but still in a single ClickHouse `INSERT` operation. Expected to be useful in the Postgres foreign data wrapper insert API, where multiple rows can be inserted at once but its API handles one-at-a-time insertion. It will also support the FDW COPY API, which can submit huge batches of data to insert, as well.

slabko

Thank you very much for contributing this feature. It has been on the list for quite some time, and I’m glad someone has started looking into it.

However, I have a few remarks.

In general, if you look at the codebase, there is no manual memory management, that is, instead of using new and delete, we rely on std::unique_ptr and std::shared_ptr to manage heap-allocated resources. In fact, the delete keyword is never used anywhere in the project. Using manual memory management of the PreparedInsert class introduces a very bad situation where PreparedInsert can be inadvertently copied.The compiler will automatically generate the copy assignment and the copy constructor operators, which could lead to shallow copies of pointers and ultimately a double-free error, if users are not careful. This can easily happen by accident.

My second remark is a bit tougher. I know you’ve put thought and care into this design, but I’ll have to ask for large changes. The PreparedInsert is not needed here, and the API is simpler without it. The insert operation should be simple and not require many visible moving parts. Ideally, I would approach it like this:

Block block = client.BeginInsert("INSERT INTO test_clickhouse_cpp_insert VALUES");
for (const auto& td : TEST_DATA) {
    id->Append(td.id);
    name->Append(td.name);
    f->Append(td.f);
}
client.SendData(block);
...
client.SendData(block);
...
client.SendData(block);
client.EndInsert();

The main points here are:

BeginInsert and EndInsert clearly form a pair and serve one another.
It’s unambiguous that no other insert or select statements should occur between them. The current PreparedInsert design creates room for sharing the PreparedInsert around, which risks losing the connection state and start using the client object for something else in the meantime. The proposed pattern enforces a clear principle: one operation → one connection → one client object. Need another parallel operation - create another client.
Here the Block object is detached, and ownership is passed to the user code. The user knows it’s not an internal part of PreparedInsert and can freely modify it if needed.
You can still preserve automatic EndInsert behavior when the client goes out of scope by tracking its state - if it’s in insert mode, call EndInsert in the destructor.
I would avoid using the word Prepare... here, because it seem to have a bit different idea than what we are trying achiave here.

Thank you again for your work. Please let me know if you’d like any help, I’d be happy to assist.

theory commented Oct 30, 2025

View reviewed changes

clickhouse/client.cpp Outdated Show resolved Hide resolved

theory commented Oct 30, 2025

View reviewed changes

theory force-pushed the insert-block branch 5 times, most recently from 51d8216 to c93c844 Compare October 31, 2025 20:50

serprex approved these changes Nov 3, 2025

View reviewed changes

mshustov requested review from Copilot and slabko November 4, 2025 08:25

Copilot AI reviewed Nov 4, 2025

View reviewed changes

ut/client_ut.cpp Show resolved Hide resolved

ut/client_ut.cpp Show resolved Hide resolved

clickhouse/client.h Show resolved Hide resolved

clickhouse/client.cpp Show resolved Hide resolved

ut/client_ut.cpp Show resolved Hide resolved

clickhouse/client.cpp Outdated Show resolved Hide resolved

theory force-pushed the insert-block branch from c93c844 to d2e84c7 Compare November 4, 2025 18:02

slabko requested changes Nov 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add PreparedInsert flow #443

Add PreparedInsert flow #443

theory commented Oct 30, 2025

Uh oh!

theory Oct 30, 2025

Uh oh!

theory Oct 30, 2025

Uh oh!

Uh oh!

theory Oct 30, 2025

Uh oh!

theory Oct 31, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

slabko left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add PreparedInsert flow #443

Are you sure you want to change the base?

Add PreparedInsert flow #443

Conversation

theory commented Oct 30, 2025

Uh oh!

theory Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

theory Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

theory Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

theory Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

slabko left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants