Releases · lance-format/lance

10 May 07:19

v0.4.8

9948c30

v0.4.8 Better support for nested fields and more supported predicates

Previously predicates on nested (and deeply nested) fields were not properly supported. This release adds support for filtering on struct sub-fields or deeply nested structs.

We also add support for more filter predicates and fixed a regression in NULL handling for string columns.

What's Changed

Fix nested schema merge by @changhiskhan in #836
Fix nested field filtering by @changhiskhan in #837
Fix Projection for struct fields by @changhiskhan in #844
[Bug] Calculating the nulls from position slices in chunks. by @eddyxu in #846
Add tests for is_NULL, is_not_null, and invert in filter by @eddyxu in #847

Full Changelog: v0.4.7...v0.4.8

Contributors

eddyxu and changhiskhan

Assets 2

09 May 01:16

eddyxu

v0.4.7

a974588

v0.4.7 Random access improvements

In this version, we improve the random access over cloud storage by allowing a higher number of parallel I/Os.

What's Changed

[Rust] increase parallelism, reduce array build overhead by @eddyxu in #830
Recursively merge record batches by @changhiskhan in #833
Allow user to customize BlockSize on ObjectStore. by @eddyxu in #835

Full Changelog: v0.4.6...v0.4.7

Contributors

eddyxu and changhiskhan

Assets 2

05 May 05:20

changhiskhan

v0.4.6

729d7db

v0.4.6 Support FileFragment creation

Allows the creation of a distributed lance dataset from scratch

What's Changed

Allow create fragment on non-existed dataset. by @eddyxu in #825

Full Changelog: v0.4.5...v0.4.6

Contributors

eddyxu

Assets 2

04 May 18:28

changhiskhan

v0.4.5

2972ae2

v0.4.5 Preview private API for merging columns

Welcome @Mause as our newest contributor! Also, a big thank you for your work on the duckdb extension framework.

In this release we added a preview of the feature to do distributed column additions. This makes it possible to distribute Lance Fragments across nodes, add a new column to each Fragment, and then write a new Lance dataset version manifest with the updated schema and files.

What's Changed

add support for aws profile by @Renkai in #807
Upgrade Arrow to 37 by @changhiskhan in #810
Schema intersection by @eddyxu in #814
Add a check to make sure field names don't contain periods by @changhiskhan in #816
fix(docs): correct link to docs.rs by @Mause in #819
update arrow version in duckdb extension by @changhiskhan in #817
Do not use lifetime on FileWriter by @eddyxu in #820
Setting field ID after merging the fields. by @eddyxu in #821
[Rust] Project schema by schema by @eddyxu in #822
Merge batches from multiple datafiles in the same Fragment by @eddyxu in #815
Update README.md by @jaichopra in #809
[Python] Provide a private / distributed add column api in Python by @eddyxu in #823

New Contributors

@Mause made their first contribution in #819

Full Changelog: v0.4.4...v0.4.5

Contributors

eddyxu, changhiskhan, and 3 other contributors

Assets 2

25 Apr 20:53

changhiskhan

v0.4.4

5c550e1

v0.4.4 Various bug fixes

#805 fixed an integer overflow bug in the plain decoder that resulted in high latency for Take (and consequently high latency for the vector search). We'll be adding continuous performance benchmarks soon to prevent issues like this from being released in the future.

We also fixed a gap in cosine similarity where the vectors does not line up perfectly with SIMD strides on the platform.

DiskANN progress is continuing. First milestone will be an in-memory version to support smaller datasets. A compressed, disk-based version will follow soon after that.

What's Changed

Fix L2 simd benchmark by @eddyxu in #793
bugfix for dataset overwrite method by @gsilvestrin in #794
[Rust] Minor SIMD benchmark fix set minimal CPU target for AVX2 by @eddyxu in #795
Persist simple diskann index by @eddyxu in #787
Fix offset overflow in plain decoder by @eddyxu in #805
Fix cosine similarity when missing simd alignment by @changhiskhan in #808

Full Changelog: v0.4.3...v0.4.4

Contributors

eddyxu, gsilvestrin, and changhiskhan

Assets 2

20 Apr 06:16

changhiskhan

v0.4.3

b5a7a68

v0.4.3 Bug fixes and code cleanup

What's Changed

[Rust] L2 distance on not aligned data by @eddyxu in #779
[Rust] Move L2 to linalg module by @eddyxu in #781
[Rust] Build DiskANN index by @eddyxu in #763
Refactor cosine distance into linalg module by @eddyxu in #786
google cloud storage fixes by @gsilvestrin in #782
Fix unaligned normalization bug on arm64 by @eddyxu in #789
Speed up vector index tests by reducing dataset size by @changhiskhan in #790

Full Changelog: v0.4.2...v0.4.3

Contributors

eddyxu, gsilvestrin, and changhiskhan

Assets 2

14 Apr 17:57

changhiskhan

v0.4.2

2adbb2f

v0.4.2 Polars, GCS, and distributed lances

A warm welcome to @hzhang86 as Lance's newest contributor. Thanks for adding TPCH benchmarks for Lance to establish a baseline. This is really helpful for us to focus performance optimization roadmap.

This release is packed with valuable features:

Direct polars scan without needing to pull everything into memory is added.
We expose FileFragment's to allow distributed processing engines like Spark to access parts of a Lance dataset easily
Last but not least, we've added support for reading Lance data directly from GS buckets

What's Changed

[Rust] FileReader read range API by @eddyxu in #752
Support direct polars scan by @changhiskhan in #755
[Rust] Persist graph using lance file format. by @eddyxu in #756
Refactor PQ and OPQ training function to make it usable widely by @eddyxu in #758
Matrix::centroids method by @eddyxu in #759
[Python] Set minimal version of Polars for python tests by @eddyxu in #765
[Rust] Refactor RecordBatchStream trait by @eddyxu in #766
[Rust] Expose DataFragment as pubilc dataset api. by @eddyxu in #769
Revert "[Python] Set minimal version of Polars for python tests (#765)" by @gsilvestrin in #770
add python script to compare lance performance vs parquet TPCH by @hzhang86 in #749
Expose index metadata by @changhiskhan in #768
Google Cloud Storage support. by @gsilvestrin in #773
[Python] Expose DataFragment via dataset by @eddyxu in #774
Get S3 credentials from_env by @changhiskhan in #775
Fix duckdb build by @eddyxu in #776
[Rust] A arrow kernel to compute hash value of the array. by @eddyxu in #777

New Contributors

@hzhang86 made their first contribution in #749

Full Changelog: v0.4.1...v0.4.2

Contributors

eddyxu, gsilvestrin, and 2 other contributors

Assets 2

05 Apr 21:30

changhiskhan

v0.4.1

ecc1d18

v0.4.1 Support Append in Vector Search

The vector search in Lance now supports live updates. Previously, when you added new vectors to the dataset, you would be required to rebuild the index. Now, the index is "inherited" and the vector search results are the combination of ANN search on the indexed data and KNN on the new Appended data. So there's a small latency increase and the recall should be the same or better.

This provides a smooth performance curve until you have inserted enough new data that re-indexing is warranted.

What's Changed

Adding secret to publish task by @gsilvestrin in #742
[Rust] make distance function to take slice instead of Float32Array by @eddyxu in #748
Vector search should support appending new rows by @changhiskhan in #593
windows lapack support by @gsilvestrin in #743
Fix LanceDataset.to_batches by @changhiskhan in #751

Full Changelog: v0.4.0...v0.4.1

Contributors

eddyxu, gsilvestrin, and changhiskhan

Assets 2

30 Mar 22:22

changhiskhan

v0.4.0

2922f54

v0.4.0 Windows support

A warm welcome to @gsajko ! Thanks for making our tutorial notebook easier to use and understand!

Note: OPQ is disabled in windows for the vector index. This will be addressed once LAPACK support is added.

What's Changed

small fixes by @gsajko in #725
Windows support by @gsilvestrin in #724

New Contributors

@gsajko made their first contribution in #725

Full Changelog: v0.3.19...v0.4.0

Contributors

gsilvestrin and gsajko

Assets 2

27 Mar 17:58

changhiskhan

v0.3.19

8aa5345

v0.3.19 Bug fix for filter predicates on large-utf8 type

Also fix publishing to crates.io

What's Changed

Make contract clear for KNN nodes by @eddyxu in #729
Refactor Scan I/O plan by @eddyxu in #731
[Rust] Use folked sqlparser to unblock rust crate release by @eddyxu in #732
[Rust] Fix filter on large UTF8 columns by @eddyxu in #733

Full Changelog: v0.3.18...v0.3.19

Contributors

eddyxu

Assets 2

Releases: lance-format/lance

v0.4.8 Better support for nested fields and more supported predicates

What's Changed

Contributors

Uh oh!

v0.4.7 Random access improvements

What's Changed

Contributors

Uh oh!

v0.4.6 Support FileFragment creation

What's Changed

Contributors

Uh oh!

v0.4.5 Preview private API for merging columns

What's Changed

New Contributors

Contributors

Uh oh!

v0.4.4 Various bug fixes

What's Changed

Contributors

Uh oh!

v0.4.3 Bug fixes and code cleanup

What's Changed

Contributors

Uh oh!

v0.4.2 Polars, GCS, and distributed lances

What's Changed

New Contributors

Contributors

Uh oh!

v0.4.1 Support Append in Vector Search

What's Changed

Contributors

Uh oh!

v0.4.0 Windows support

What's Changed

New Contributors

Contributors

Uh oh!

v0.3.19 Bug fix for filter predicates on large-utf8 type

What's Changed

Contributors

Uh oh!