-
-
Notifications
You must be signed in to change notification settings - Fork 378
chore!: replace direct download of nyc_taxi with hvsampledata #1462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
hoxbro
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should remove pyct + setuptools from pyproject.toml + pixi.toml
| "import hvsampledata as hvs\n", | ||
| "\n", | ||
| "df = pd.read_csv('../data/nyc_taxi.csv', usecols=['dropoff_x', 'dropoff_y'])\n", | ||
| "df = hvs.nyc_taxi(\"pandas\", engine_kwargs={\"columns\": ['dropoff_x', 'dropoff_y']})\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "df = hvs.nyc_taxi(\"pandas\", engine_kwargs={\"columns\": ['dropoff_x', 'dropoff_y']})\n", | |
| "df = hvs.nyc_taxi_remote(\"pandas\", engine_kwargs={\"columns\": ['dropoff_x', 'dropoff_y']})\n", |
Also, mention that this is the first time this cell is run.
pixi.toml
Outdated
| setuptools = "*" # distutils for pyct | ||
| toolz = "*" | ||
| xarray = "*" | ||
| fastparquet = "*" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check that this has not indirectly pinned anything like pandas.
CodSpeed Performance ReportMerging #1462 will improve performances by 49.12%Comparing Summary
Benchmarks breakdown
|
maximlt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be first deprecated before being removed https://holoviz.org/about/heps/hep2.html.
Are you okay with not having pyct as a required dependency? but still having the entrypoint and explaining that:
This is also a little bit weird, because we don't break running code, but "only" the CLI. |
|
The CLI is part of public interface so I think it deserves the same treatment as the API. I don't see an urgent reason to immediately remove pyct from the dependencies. I'm in favor of simply adding deprecation warnings, and removing the whole thing in a few releases. |
datashader/__init__.py
Outdated
| from . import transfer_functions as tf # noqa (API import) | ||
| from . import data_libraries # noqa (API import) | ||
|
|
||
| with suppress(ImportError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the right place it should be in main.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK!
| 'numpy', | ||
| 'pandas', | ||
| 'param', | ||
| 'pyct', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think pyct should be removed yet from the dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not removed totally. Just moved to optional dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With these changes if I install datashader and run datashader example it breaks. So that's a breaking change. The goal is to deprecate it so people get a chance to stop using datashader example. Then pyct can be removed entirely from the dependencies.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1462 +/- ##
==========================================
- Coverage 88.34% 88.30% -0.04%
==========================================
Files 96 96
Lines 18932 18946 +14
==========================================
+ Hits 16725 16730 +5
- Misses 2207 2216 +9 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I added a temporary patch to allow the CI to pass pending a new hvsampledata release but otherwise I think it's ready for review now. |
pyproject.toml
Outdated
| # 2025-09 | ||
| "ignore:Signature .* for <class 'numpy.longdouble'>.*:UserWarning", | ||
| # 2025-10 | ||
| "ignore:The 'pyct' package bundled as a datashader dependency is deprecated since version 0.19 and will be removed in version 0.20.*" # https://github.com/holoviz/datashader/pull/1462 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should remove this filtering. Running datashader in Python (and not CLI) should never show this warning.
| "import hvsampledata as hvs\n", | ||
| "import hvsampledata as hvs, pandas as pd\n", | ||
| "\n", | ||
| "df = hvs.nyc_taxi_remote(\"pandas\", engine_kwargs={\"columns\": ['dropoff_x', 'dropoff_y']})\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us keep this.
pixi.toml
Outdated
| setuptools = "*" # distutils for pyct | ||
| toolz = "*" | ||
| xarray = "*" | ||
| pyarrow = "*" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be added back to where it previously was.
| with suppress(ImportError): | ||
| import pyct.cmd | ||
| from datashader import _warn_pyct_deprecated | ||
|
|
||
| _warn_pyct_deprecated(stacklevel=1) | ||
| pyct.cmd.fetch_data(name="data", path="examples", datasets="datasets.yml") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scripts directory is for internal use only; please remove this.
datashader/__init__.py
Outdated
|
|
||
| def _warn_pyct_deprecated(stacklevel=2): | ||
| warnings.warn( | ||
| "The 'pyct' package is deprecated since version 0.19 " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a user saw this warning, do you think they would understand it?
I will also move warnings import into the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a user saw this warning, do you think they would understand it?
Reads simple enough for me, yeah.
I will also move warnings import into the function.
OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me rephrase it: what does the pyct package do, and why is it affecting the deprecated functions? Without looking at the code, it seems very confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, OK I get it now. A user wouldn't have used pyct directly and would have been confused seeing the mention of it in the message. Will rephrase to mention the actual functions being called.
datashader/__init__.py
Outdated
| """Wrapper to add deprecation warning to pyct functions.""" | ||
| @wraps(func) | ||
| def wrapper(*args, **kwargs): | ||
| _warn_pyct_deprecated(stacklevel=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if these stack levels are set correctly, but they are likely still emitted because the category is a FutureWarning.
You can test this out by setting the CategoryWarning to DeprecationWarning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. Thanks. The correct stacklevel is 3 in this instance. I tested it via
import datashader as ds
ds.fetch_data("datashader")
This PR replaces the direct download of the large
nyc_taxidata with the version from hvsampledata.Testing with an editable install of hvsampledata