Use this extension to read data sets from SAS, Stata, and SPSS from DuckDB with ReadStat.
Installation is simple through the DuckDB Community Extension repository, just type
INSTALL read_stat FROM community;
LOAD read_stat;
in a DuckDB instance near you.
The extension adds a single DuckDB table function, read_stat
, which you use as follows:
-- Read a SAS `.sas7bdat` or `.xpt` file
FROM read_stat('sas_data.sas7bdat');
FROM read_stat('sas_data.xpt');
-- Read an SPSS `.sav`, `.zsav`, or `.por` file
FROM read_stat('spss_data.sav');
FROM read_stat('compressed_spss_data.zsav');
FROM read_stat('portable_spss_data.por');
-- Read a Stata .dta file
FROM read_stat('stata_data.dta');
If the file extension is not .sas7bdat
, .xpt
, .sav
, .zsav
, .por
, or .dta
,
use the read_stat
function for the right file type with the format
parameter:
FROM read_stat('sas_data.other_extension', format = 'sas7bdat');
FROM read_stat('sas_data.other_extension', format = 'xpt');
-- SPSS `.sav` and `.zsav` can both be read through the format `'sav'`
FROM read_stat(
'spss_data_possibly_compressed.other_extension',
format = 'sav'
);
FROM read_stat('portable_spss_data.other_extension', format = 'por');
FROM read_stat('stata_data.other_extension', format = 'dta');
Override the file character encoding
inferred from the file with an iconv
encoding name, see https://www.gnu.org/software/libiconv/:
FROM read_stat('latin1_encoded.sas7bdat', encoding = 'iso-8859-1');
If your files have the proper file extensions and you do not need to override their character encodings, a replacement scan is also available:
-- Read a SAS `.sas7bdat` or `.xpt` file
FROM 'sas_data.sas7bdat';
FROM 'sas_data.xpt';
-- Read an SPSS `.sav`, `.zsav`, or `.por` file
FROM 'spss_data.sav';
FROM 'compressed_spss_data.zsav';
FROM 'portable_spss_data.por';
-- Read a Stata .dta file
FROM 'stata_data.dta';
Clone the repo with submodules
git clone --recurse-submodules <repo>
In principle, compiling this template only requires a C/C++ toolchain. However, this template relies on some additional tooling to make life a little easier and to be able to share CI/CD infrastructure with extension templates for other languages:
- Python3
- Python3-venv
- Make
- CMake
- Git
- (Optional) Ninja + ccache
- vcpkg
Installing these dependencies will vary per platform:
- For Linux, these come generally pre-installed or are available through the distro-specific package manager.
- For MacOS, homebrew.
- For Windows, chocolatey.
After installing the dependencies, building is a two-step process. Firstly run:
make configure
This will ensure a Python venv is set up with DuckDB and DuckDB's test runner installed. Additionally, depending on configuration, DuckDB will be used to determine the correct platform for which you are compiling.
Then, to build the extension run:
make debug
This delegates the build process to cargo, which will produce a shared library in target/debug/<shared_lib_name>
. After this step,
a script is run to transform the shared library into a loadable extension by appending a binary footer. The resulting extension is written
to the build/debug
directory.
To create optimized release binaries, simply run make release
instead.
We recommend to install Ninja and Ccache for building as this can have a significant speed boost during development. After installing, ninja can be used by running:
make clean
GEN=ninja make debug
This extension uses the DuckDB Python client for testing. This should be automatically installed in the make configure
step.
The tests themselves are written in the SQLLogicTest format, just like most of DuckDB's tests. A sample test can be found in
test/sql/<extension_name>.test
. To run the tests using the debug build:
make test_debug
or for the release build:
make test_release
Testing with different DuckDB versions is really simple:
First, run
make clean_all
to ensure the previous make configure
step is deleted.
Then, run
DUCKDB_TEST_VERSION=v1.1.2 make configure
to select a different duckdb version to test with
Finally, build and test with
make debug
make test_debug
The DuckDB Extension C API has a stable part and an unstable part. By default, this template only allows usage of the stable part of the API. To switch it to allow using the unstable part, take the following steps:
Firstly, set your TARGET_DUCKDB_VERSION
to your desired in ./Makefile
. Then, run make update_duckdb_headers
to ensure
the headers in ./duckdb_capi
are set to the correct version. (FIXME: this is not yet working properly).
Finally, set USE_UNSTABLE_C_API
to 1 in ./Makefile
. That's all!