You can read more about the purpose from README.md.
Install requirements:
$ git clone https://github.com/Microsoft/vcpkg.git
./vcpkg/bootstrap-vcpkg.sh
Ensure that git submodules have been checkout:
$ git submodule update --init --remote --recursive
Build the project using ninja and vckpg:
$ make release GEN=ninja VCPKG_TOOLCHAIN_PATH=$(pwd)/vcpkg/scripts/buildsystems/vcpkg.cmake
Don't remove the build directory everytime after changes. This will make it much more slower to iterate.
You're only allowed to delete the build if you think it's a "systemic issue with the development environment".
DuckDB core is a git subdmodule of the project in duckdb/ folder. If you need to search for examples don't do Web searches but instead check for code inside that folder.
IMPORTANT: After the build is finished you need to run: make test
This will run tests in the tests/ folder.
When you're fixing a issue from Github commit your changes after you finish. If the git commit hooks return any errors fix them before finishing with the task.
IMPORTANT: You're not allowed to use git commit --no-verify ever
Problem: libxml2's xmlSetGenericErrorFunc() sets a GLOBAL error handler affecting all threads, causing race conditions and heap corruption in multi-threaded execution.
Anti-Pattern (DO NOT DO THIS):
// WRONG - Global state modification causes race conditions
xmlSetGenericErrorFunc(nullptr, XMLSilentErrorHandler); // GLOBAL!
xmlXPathEvalExpression(...);
xmlSetGenericErrorFunc(nullptr, nullptr); // GLOBAL!Correct Pattern (DuckDB-Style):
// RIGHT - Use per-operation configuration
xmlParserCtxtPtr parser_ctx = xmlNewParserCtxt();
doc = xmlCtxtReadMemory(parser_ctx, xml_str.c_str(), xml_str.length(),
nullptr, nullptr,
XML_PARSE_RECOVER | XML_PARSE_NOERROR | XML_PARSE_NOWARNING);Key Principles:
- Avoid global state modification - Use per-operation or per-thread configuration
- No mutexes unless absolutely necessary - They're a last resort, not the DuckDB way
- Follow sitting_duck's pattern - Each thread gets its own resources (like TSParser instances)
- Parsing options > Global handlers - Configure at creation time, not globally
- Context-specific handlers only - Use
xmlSchemaSetValidErrors()for schema validation, not global handlers
Reference: The sitting_duck extension shows how to handle treesitter (another stateful C library) in DuckDB - each thread creates its own parser instance instead of sharing global state.
Run builds and tests via blq MCP tools, not via Bash directly:
mcp__blq_mcp__commands- list available commandsmcp__blq_mcp__run- run a registered command (e.g.,run(command="test"))mcp__blq_mcp__register_command- register new commandsmcp__blq_mcp__status- check current build/test statusmcp__blq_mcp__errors- view errors from runsmcp__blq_mcp__info- detailed run info (supports relative refs like+1,latest)mcp__blq_mcp__output- search/filter captured logs (grep, tail, head, lines)
Do NOT use shell pipes or redirects in commands (e.g., pytest | tail -20).
Instead: run the command, then use output(run_id=N, tail=20) to filter.