Skip to content

Conversation

@brycekbargar
Copy link
Collaborator

Contains a refactor of the download portion of the database code to allow for using COPY FROM for inserting data into postgres. There's a couple of other minor bugfixes and upgrades in the CHANGELOG.

…#40)

I want to use psycopg3 features (the sql module and the copy write_record) but don't want to break backwards compatibility just yet. This introduces psycopg3 while leaving in psycopg2 and switching it for the binary version to make it easier for users to install the library.

While messing with the connection methods I've isolated the connection object that is returned from ldlite and the one that ldlite uses internally. This makes sure that the connections aren't broken or closed unexepectedly.
LDlite has been playing fast and loose with concatenating sql strings. It's probably fine but psycopg's sql module is intended to make this safe. While refactoring the existing code to implementing the sql module I've abstracted it into a database class intended to be overridden. In the first implementation we're still using the DBType enum and a single implementation, this will change in future into separate postgres and duckdb implementations without the DBType enum.
The drop tables logic was always a little weird and required autocommit transaction hacks for postgres. This refactors the existing drop tables logic into the database abstraction. Any place that is dropping tables also uses the same logic where is was duplicated before. I don't know when the jtable was used but it doesn't hurt to leave it in I guess?

This also lets the abstraction manage its own connection as is intended by the abstraction. There is maybe a behavior change here, where a table in the catalog doesn't actually exist in the database. Before that wouldn't cause an error but now it will. I'm ok with this edge case as the behavior wasn't really specified before.

Lastly, the on_progress function was doing multiple checks which adds up over a million invocations. Maybe python optimizes these code paths at runtime but it doesn't hurt to do a little less work by default.
Using COPY FROM is an order of magnitude faster for bulk insertions for postgres. This is the last low-hanging fruit for download performance until async/await is supported and optimized for in a future future release. DuckDB can kind of sort of do bulk operations but it isn't a priority right now, especially with sqlite still supported in this release.

Getting bytes do go directly into the database was a bit of a struggle but postgres expects a "version" byte at the beginning of any record. By doing the byte munging ourselves we skip a whole lot of conversion in ldlite and conversion/logic in psycopg which would have added points of failure and slowness. I'm going to be testing to make sure it works across 5C FOLIO data before releasing the change.
I wasn't super happy with the weird type shenanigans happening in the last MR #43 but thought that doing a loads/dumps on all the source records to keep it consistent would be too slow. I discovered orjson.Fragment which means we can just use dumps and treat both srs and non-srs as bytes. This simplified the signatures and type handling quite a bit.

Streaming support for SRS was rushed because non-streaming became even more unstable under Ramsons. There was a chance (though probably small) that someone was loading source-storage endpoints that weren't the source-records one and they would have broken. This adds more consistency around which endpoints we do support with streaming and fixes any accidental breaks.
@brycekbargar brycekbargar marked this pull request as draft September 19, 2025 16:50
@brycekbargar brycekbargar marked this pull request as ready for review September 19, 2025 16:51
@brycekbargar brycekbargar merged commit 26fcfa1 into main Sep 22, 2025
6 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants