Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define a minimal Python version of the package containing vars_rename #31

Conversation

jeancochrane
Copy link
Contributor

@jeancochrane jeancochrane commented Nov 20, 2024

This PR defines a minimal Python ccao package in this repo with one function ported over from R, vars_rename. The package lives in the python/ subdir and uses the following tools:

  • uv for packaging
  • ruff for linting/formatting
  • pytest for unit tests
  • sphinx for docs

We add a new pytest-coverage GitHub workflow to ensure that tests run on PRs and commits to the main branch. We also rename the pkgdown workflow to docs and update it so that it deploys Python docs alongside R docs in the relative path /ccao/python/.

Closes #29.

⚠️ Note that for some reason, CI thinks that the build-pkgdown-site check needs to complete before merging, so it's not reporting the CI checks as being successful. I don't understand why this is happening, given that the build-pkgdown-site step no longer exists. Maybe there's a repo-level setting that is requiring this stage? If not, I figure we can just force merge this when it's ready and hope it's a caching bug that resolves itself.

@jeancochrane jeancochrane linked an issue Nov 20, 2024 that may be closed by this pull request
5 tasks
Copy link

codecov bot commented Nov 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (969dae7) to head (bbbaf68).
Report is 4 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##            master       #31   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            6         6           
  Lines          382       382           
=========================================
  Hits           382       382           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

run: pkgdown::build_site_github_pages(new_process = FALSE, install = FALSE)
shell: Rscript {0}

- name: Build Python docs
run: sphinx-build python/docs/source docs/python
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building these docs into the docs/python subdirectory makes them show up under the relative path /python in the deployed docs site.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (non-blocking): IMO it would be nice to have the R docs at /r for consistency, with a redirect/301 at the root to maintain old links.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't actually think redirects are possible on GitHub Pages, at least if this feature request is still up to date. We could implement redirects if we switched to a different static hosting provider like Netlify, but that's a bigger discussion. Let me know if you can think of any other way to accomplish something similar and I'm happy to take a crack at it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might be able to do client-side redirects with http-equiv.

@@ -0,0 +1,71 @@
"pin","year","class","char_yrblt","char_bldg_sf","char_land_sf","char_beds","char_rooms","char_fbath","char_hbath","char_frpl","char_type_resd","char_cnst_qlty","char_apts","char_tp_dsgn","char_attic_fnsh","char_gar1_att","char_gar1_area","char_gar1_size","char_gar1_cnst","char_attic_type","char_bsmt","char_ext_wall","char_heat","char_repair_cnd","char_bsmt_fin","char_roof_cnst","char_use","char_age","char_site","char_ncu","char_renovation","char_porch","char_air","char_tp_plan"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish we could just share these fixtures with the R package, but the R package needs the fixtures to be saved as RData objects, and I feel like it would be kind of weird to bundle that in the Python package. Down to give it a try if you have a strong preference, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: IMO it's worth it to create a shared directory for these fixtures that the R scripts use to instantiate the .rda files and the Python package uses for fixtures. The dir would contain this data, saved as CSV and created via SQL (from the existing R data-raw scripts).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea! Done in e66eff3. Until we refactor our data management in #30, it seemed like the simplest path forward was just to keep data-raw/ as the shared source of CSV data for now, but let me know if you have another preference.

@jeancochrane jeancochrane marked this pull request as ready for review November 21, 2024 23:33
@jeancochrane jeancochrane requested a review from a team as a code owner November 21, 2024 23:33
@jeancochrane jeancochrane requested a review from dfsnow November 21, 2024 23:41
Copy link
Member

@dfsnow dfsnow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeancochrane This is looking great! Nice first pass, no major blockers. Looking forward to the remaining package.

You can install the released version of `ccao` directly from GitHub:

```bash
pip install "git+https://github.com/ccao-data/ccao.git#egg=ccao&subdirectory=python"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There doesn't seem to be a ccao package on PyPI. Why not push it there?

Comment on lines 8 to 9
{name = "Dan Snow", email="[email protected]"},
{name = "Jean Cochrane", email="[email protected]"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (non-blocking): I didn't make any of this package! At least put yourself first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of you! Done in cd5bc3e.

]

[project.optional-dependencies]
test = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (non-blocking): I've taken to naming this dev instead of test since ruff and mypy aren't specific to testing. I don't feel super strongly about it though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, done in aaebf7c!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question (non-blocking): Why have a .python-version file if it's not used in the GH workflows? Just to set the default version for uv?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, removed in 1b6bbef.

run: pkgdown::build_site_github_pages(new_process = FALSE, install = FALSE)
shell: Rscript {0}

- name: Build Python docs
run: sphinx-build python/docs/source docs/python
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (non-blocking): IMO it would be nice to have the R docs at /r for consistency, with a redirect/301 at the root to maintain old links.

vars_dict = pd.read_csv(str(_data_path / "vars_dict.csv"))

# Prefix we use to identify variable name columns in the variable dictionary
var_name_prefix = "var_name"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: Shouldn't this be an all caps constant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, done in c8f4e49.

data: list[str] | pd.DataFrame,
names_from: str,
names_to: str,
output_type: OutputType | str = OutputType.INPLACE,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: I'm kind of a dumdum when it comes to this kind of stuff, but what's the advantage of enumerating the possible type values like this? Why not just check them in the function body?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was that a DRYer approach would reduce maintenance burden in case we have to change the options someday or define logic that might best be owned by the param abstraction. But I think it's a bit overengineered for the current use-case so I stripped it out in 81d8c20 in favor of strings that we check in the body of the function.

Comment on lines 115 to 117
# Validate names arguments
if not isinstance(names_from, str) or not isinstance(names_to, str):
raise ValueError("names_from and names_to must be strings")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: (non-blocking): Since the function is typed, isn't this check redundant? Same goes for the other isinstance checks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that it might be helpful to double-check types in case users don't have type-checking enabled in their project, but since this package is pretty specific to us and we always enable type checking, I doubt it's worth the extra code. I removed the duplicative type checking in b837865.

@@ -0,0 +1,71 @@
"pin","year","class","char_yrblt","char_bldg_sf","char_land_sf","char_beds","char_rooms","char_fbath","char_hbath","char_frpl","char_type_resd","char_cnst_qlty","char_apts","char_tp_dsgn","char_attic_fnsh","char_gar1_att","char_gar1_area","char_gar1_size","char_gar1_cnst","char_attic_type","char_bsmt","char_ext_wall","char_heat","char_repair_cnd","char_bsmt_fin","char_roof_cnst","char_use","char_age","char_site","char_ncu","char_renovation","char_porch","char_air","char_tp_plan"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: IMO it's worth it to create a shared directory for these fixtures that the R scripts use to instantiate the .rda files and the Python package uses for fixtures. The dir would contain this data, saved as CSV and created via SQL (from the existing R data-raw scripts).

python/uv.lock Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question (non-blocking): Why include this lock file if the the dependencies are already captured in pyproject.toml?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, removed in 5960235.

@jeancochrane jeancochrane force-pushed the jeancochrane/29-create-and-deploy-a-minimal-python-version-of-this-package branch from c1c40d9 to b837865 Compare November 25, 2024 19:37
@jeancochrane jeancochrane requested a review from dfsnow November 25, 2024 20:04
@jeancochrane
Copy link
Contributor Author

I made some major changes based on your comments @dfsnow, so it would be good to get a second look before I merge!

Copy link
Member

@dfsnow dfsnow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great @jeancochrane. Nice fixups!

if output_type == "inplace":
assert list(result.columns) == expected
else:
class TestVarsRename:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 How did I not notice this wasn't in a test class. Nice catch.

@jeancochrane jeancochrane merged commit 5be78a6 into master Nov 26, 2024
14 checks passed
@jeancochrane jeancochrane deleted the jeancochrane/29-create-and-deploy-a-minimal-python-version-of-this-package branch November 26, 2024 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create and deploy a minimal Python version of this package
2 participants