Ratchet

Ratchet is a prototype system for pipeline-based query suspending and resuming

Ratchet implementation is modified from DuckDB.

Prerequisite

Third-party

It is highly recommended to add third-party libs whose whole source code is in a single header file. Then, you can add them by,

copying the header file of the third-party lib to third_party folder
adding include_directories(third_party/xxx) after include_directories(src/include) in the CMakeLists.txt at the root directory. You may have to recompile the source code if needed, using make in root folder.
if you are working on Python client, you also need to update third_party_includes() in scripts/package_build.py. You may have to reinstall python client to reflect the change, using pip3 install . in /tools/pythonpkg.

JSON for Modern C++

We import the nlohmann/json to serialize and deserialize JSON. Github: https://github.com/nlohmann/json

Python Client Modification

When you want to add a new Python API or modify an existing one for DuckDB especially for virtual environments, you need to:

Install mypy python library in the virtual environment
Modify the source code in tools/pythonpkg/src to reflect to API change
Run scripts/regenerate_python_stubs.sh at the root directory of DuckDB, making sure <Ratchet-DuckDB>/tools/pythonpkg/duckdb-stubs/__init__.pyi already reflect the API change
Install the modified DuckDB again using python setup.py install in <Ratchet-DuckDB>/tools/pythonpkg
If you still cannot apply the change you made for Python Client APIs, please repeat 3,4 for mutiple times, you should be fine.

Source Code Compilation

Main source code [C++]

The main codebase is written in C++, so it is common to use cmake to compile the source code. Namely, using make command in root folder.

Python Client Installation

Install pybind11 using pip3 install pybind11 (system-wide or virtual environment)

pip3 show pybind11 will tell you where is the pybind11, for example, /home/{user_path}/{venv}/lib/python3.7/site-packages

Then, </path/to/pybind11> is, for example, /home/{user_path}/{venv}/lib/python3.7/site-packages/pybind11

If you are using CLion IDE for development, and make sure CLion can link all the source code, you may need to add -DBUILD_PYTHON_PKG=TRUE -DCMAKE_PREFIX_PATH=</path/to/pybind11> in Settings | Build, Execution, Deployment | CMake | CMake Options. This will tell CLion where to find pybind11.

Ratchet-DuckDB can be used and tested by a python client. It is recommended to install the python client in a python virtual environment.

source <path/to/python-virtual-environment/bin/activate>
cd <Ratchet>/tools/pythonpkg 
pip3 install . 
# or python setup.py install

Source Code Modification

Sink(), Finalize(), and GetData() are the functions for query suspension and resumption. Usually, query suspension should happen in Finalize(), while query resumption should happen in the Sink(). However, it is still case-by-case due to implementation or performance reason, for example, resumption for aggregation may happen in GetData().

Adding suspension and resumption APIs in pyconnection.cpp and pyconnection.hp
Checking finished pipelines when resumption in pipeline.cpp
Suspending and resuming ungrouped aggregation in physical_ungrouped_aggregate.cpp
Suspending and resuming in-memory hash join in physical_hash_join.cpp and perfect_hash_join_executor.cpp
Suspending and resuming external hash join in physical_hash_join.cpp
Suspending and resuming grouped aggregation in physical_hash_aggregate.cpp

List of Modification

tools/pythonpkg/src/pyconnection.cpp
tools/pythonpkg/include/duckdb_python/pyconnection/pyconnection.hpp
src/include/duckdb/common/constants.hpp
src/include/duckdb/common/types/data_chunk.hpp
src/include/duckdb/common/vector_operations/aggregate_executor.hpp
src/include/duckdb/execution/operator/join/perfect_hash_join_executor.hpp
src/include/duckdb/execution/executor.hpp
src/include/duckdb/main/client_config.hpp
src/include/duckdb/parallel/pipeline.hpp
src/common/constants.cpp
src/main/settings/settings.cpp
src/execution/operator/aggregate/physical_hash_aggregate.cpp
src/execution/operator/aggregate/physical_ungrouped_aggregate.cpp
src/execution/operator/join/perfect_hash_join_executor.cpp
src/execution/operator/join/physical_hash_join.cpp
src/execution/operator/join/physical_range_join.cpp
src/execution/operator/order/physical_order.cpp
src/execution/operator/scan/physical_table_scan.cpp
src/execution/join_hashtable.cpp
src/parallel/executor.cpp
src/parallel/pipeline.cpp
src/parallel/pipeline_executor.cpp

DuckDB

DuckDB is a high-performance analytical database system. It is designed to be fast, reliable and easy to use. DuckDB provides a rich SQL dialect, with support far beyond basic SQL. DuckDB supports arbitrary and nested correlated subqueries, window functions, collations, complex types (arrays, structs), and more. For more information on the goals of DuckDB, please refer to the Why DuckDB page on our website.

Installation

If you want to install and use DuckDB, please see our website for installation and usage instructions.

Data Import

For CSV files and Parquet files, data import is as simple as referencing the file in the FROM clause:

SELECT * FROM 'myfile.csv';
SELECT * FROM 'myfile.parquet';

Refer to our Data Import section for more information.

SQL Reference

The website contains a reference of functions and SQL constructs available in DuckDB.

Development

For development, DuckDB requires CMake, Python3 and a C++11 compliant compiler. Run make in the root directory to compile the sources. For development, use make debug to build a non-optimized debug version. You should run make unit and make allunit to verify that your version works properly after making changes. To test performance, you can run BUILD_BENCHMARK=1 BUILD_TPCH=1 make and then perform several standard benchmarks from the root directory by executing ./build/release/benchmark/benchmark_runner. The detail of benchmarks is in our Benchmark Guide.

Please also refer to our Contribution Guide.

Name		Name	Last commit message	Last commit date
Latest commit History 141 Commits
benchmark		benchmark
data		data
examples		examples
extension		extension
logo		logo
scripts		scripts
src		src
test		test
third_party		third_party
tools		tools
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CMakeLists.txt		CMakeLists.txt
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Doxyfile		Doxyfile
DuckDBConfig.cmake.in		DuckDBConfig.cmake.in
DuckDBConfigVersion.cmake.in		DuckDBConfigVersion.cmake.in
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ratchet

Prerequisite

Third-party

JSON for Modern C++

Python Client Modification

Source Code Compilation

Main source code [C++]

Python Client Installation

Source Code Modification

List of Modification

DuckDB

Installation

Data Import

SQL Reference

Development

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Ratchet

Prerequisite

Third-party

JSON for Modern C++

Python Client Modification

Source Code Compilation

Main source code [C++]

Python Client Installation

Source Code Modification

List of Modification

DuckDB

Installation

Data Import

SQL Reference

Development

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages