pip install git+https://github.com/ajfriend/pdx
Small ergonomic improvements to make it easy to run DuckDB queries on Pandas DataFrames.
pdx
monkey-patchespandas.DataFrame
to provide adf.sql(...)
method.- since
pdx
uses DuckDB, you can leverage their convienient SQL dialect:
Query a Pandas DataFrame with df.sql(...)
.
Omit the FROM
clause because it is added implicitly:
import pdx
iris = pdx.data.get_iris() # returns pandas.DataFrame
iris.sql("""
select
species,
count(*)
as num,
group by
1
""")
You can use short SQL (sub-)expressions because FROM
and SELECT *
are implied whenever they're omitted:
iris.sql('where petal_length > 4.5')
iris.sql('limit 10')
iris.sql('order by petal_length')
iris.sql('') # returns the dataframe unmodified. I.e., 'select * from iris'
For more, check out the example notebook folder.
df.aslist()
df.asdict()
df.asitem()
df.cols2dict()
- save/load helpers for DuckDB database files
git clone https://github.com/duckdb/duckdb.git
cd duckdb
../env/bin/pip install -e tools/pythonpkg --verbose