Skip to content

chore: Ensure that pyproject.toml is compatible with poetry >= 2.0 #28

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 12, 2025

Conversation

paleolimbot
Copy link
Contributor

@paleolimbot paleolimbot commented Jan 31, 2025

The current pyproject.toml fails to build with a naive install of pip install poetry. Poetry 2.0 introduced several changes, notably introducing wider compatibility with the project key (instead of tool.poetry). Our namespace packaging was also broken (includes the change from #26, since this also requires a change in the pyproject.toml).

With newer Python, we also loose the ability to install because dependencies are pinned to major versions and not all projects publish wheels for newer Python versions (notably: pyarrow). I updated the dependencies to be minimum versions and added a CI job to enforce compatibility with newer versions of packages. Because python and pandas were used for very specific result types, I made them optional to help users resolve their environments (but I can revert this change if for some reason that is causing an issue).

It may be that there are downstream projects that were depending on some of these version specifiers, but unless we have a good reason to believe that behaviour changed (or will change) in a subsequent version, this type of versioning is likely to cause issues when installed alongside almost any other component the modern data stack.

@paleolimbot paleolimbot marked this pull request as draft January 31, 2025 22:39
@paleolimbot paleolimbot marked this pull request as ready for review January 31, 2025 22:54
peterfoldes
peterfoldes previously approved these changes Jan 31, 2025
Copy link
Contributor

@peterfoldes peterfoldes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thank you!

@rbavery
Copy link
Member

rbavery commented Jan 31, 2025

not all projects publish wheels for newer Python versions (notably: pyarrow)

this is why uv's default behavior is to not lock with upper bounds when adding a new dependency and it differs from poetry's default behavior. switching to use uv here will make it easier to not accidentally lock to major version in the toml in the future.

@paleolimbot
Copy link
Contributor Author

I agree that poetry is maybe not the way to go here (just trying to minimize the diff for now!). We probably also want to use project rather than tool.poetry (but can't because poetry 2.0 is not available for Python 3.8 😮‍💨 )

@paleolimbot
Copy link
Contributor Author

@peterfoldes Apologies for the noise here...I'm new to poetry but I think this is now correct.

peterfoldes
peterfoldes previously approved these changes Feb 5, 2025
Copy link
Contributor

@peterfoldes peterfoldes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, assuming we do actually want to remove the lock file here. Maybe get a +1 from @mpetazzoni here as well.

Copy link
Member

@mpetazzoni mpetazzoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, it's best practice to keep your PR with a clean commit history so it can be merged without seeing the iterative review/feedback/fix history, and while preserving a set of meaningful, independent commits in the branch being merged in.

See https://about.gitlab.com/blog/2020/11/23/keep-git-history-clean-with-interactive-rebase/ for how to do that.

@@ -178,6 +178,8 @@ def _handle_results(self, execution_id: str, results: Dict[str, Any]) -> Any:
if result_format == ResultsFormat.JSON:
return json.loads(result_bytes.decode("utf-8"))
elif result_format == ResultsFormat.ARROW:
import pyarrow
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand it's nice(r) for this import to be done only if we're actually dealing with Arrow results, but I think we should make (or at least try/verify) this import at the time of the request in __request_results() when the requested results format in Arrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the lazy import! Battle for another day 🙂

@@ -146,6 +144,8 @@ def __listen(self) -> None:
query.state = ExecutionState.COMPLETED
query.handler(self._handle_results(execution_id, results))
elif query.state == ExecutionState.CANCELLED:
import pandas
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do this import lazily? I'd rather the module fail to load/import at the beginning, than introduce a subtle failure for the application developer on the first time they happen to cancel a query.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Snowflake's connector I believe uses the pandas extra to install pyarrow/pandas to enable use on lambda; however, that's out of scope and requires a separate discussion so I just removed it from this PR.

@mpetazzoni
Copy link
Member

@paleolimbot @peterfoldes It's okay to remove the poetry.lock I think, but then we should add it to .gitignore.

fix the specification for dependencies

whoops

undo lazy imports
@paleolimbot
Copy link
Contributor Author

Note, it's best practice to keep your PR with a clean commit history so it can be merged without seeing the iterative review/feedback/fix history, and while preserving a set of meaningful, independent commits in the branch being merged in.

I find it difficult to maintain velocity here (and personally find it difficult to keep up with reviews for contributors to my projects that do this, because the context for comments/"view changes since last review" feature is typically borked by history-changing things like squashes). I prefer to use PRs as the unit for the commit and squash them because all the other projects I work on prefer or require this (but also happy to attempt new things if that's a hard sticking point).

Copy link
Contributor

@peterfoldes peterfoldes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mpetazzoni
Copy link
Member

Note, it's best practice to keep your PR with a clean commit history so it can be merged without seeing the iterative review/feedback/fix history, and while preserving a set of meaningful, independent commits in the branch being merged in.

I find it difficult to maintain velocity here (and personally find it difficult to keep up with reviews for contributors to my projects that do this, because the context for comments/"view changes since last review" feature is typically borked by history-changing things like squashes). I prefer to use PRs as the unit for the commit and squash them because all the other projects I work on prefer or require this (but also happy to attempt new things if that's a hard sticking point).

@paleolimbot I agree, and I'd be okay with doing this history clean up as a final step before approval + merge.

@paleolimbot
Copy link
Contributor Author

Thanks you for the review!

@paleolimbot
Copy link
Contributor Author

(I don't have permissions to merge!)

@mpetazzoni mpetazzoni merged commit 61e54e6 into wherobots:main Feb 12, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants