Skip to content

Conversation

@Azaya89
Copy link
Contributor

@Azaya89 Azaya89 commented Sep 26, 2025

This PR replaces the direct download of the large nyc_taxi data with the version from hvsampledata.

Testing with an editable install of hvsampledata

import hvsampledata as hvs
import datashader as ds
import pandas as pd

path = hvs.download("nyc_taxi_remote")
df = pd.read_parquet(path)
print(df.head())

agg = ds.Canvas().points(df, 'dropoff_x', 'dropoff_y')
print(agg.shape)
   tpep_pickup_datetime tpep_dropoff_datetime  passenger_count  ...  tip_amount  dropoff_hour  pickup_hour
0  2015-01-15 19:05:39   2015-01-15 19:23:42                1  ...        3.25            19           19
1  2015-01-10 20:33:38   2015-01-10 20:53:28                1  ...        2.00            20           20
2  2015-01-10 20:33:38   2015-01-10 20:43:41                1  ...        0.00            20           20
3  2015-01-10 20:33:39   2015-01-10 20:35:31                1  ...        0.00            20           20
4  2015-01-10 20:33:39   2015-01-10 20:52:58                1  ...        0.00            20           20

[5 rows x 12 columns]
(600, 600)

@Azaya89 Azaya89 requested a review from Copilot September 26, 2025 13:26

This comment was marked as outdated.

@Azaya89 Azaya89 self-assigned this Sep 26, 2025
Copy link
Member

@hoxbro hoxbro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should remove pyct + setuptools from pyproject.toml + pixi.toml

"import hvsampledata as hvs\n",
"\n",
"df = pd.read_csv('../data/nyc_taxi.csv', usecols=['dropoff_x', 'dropoff_y'])\n",
"df = hvs.nyc_taxi(\"pandas\", engine_kwargs={\"columns\": ['dropoff_x', 'dropoff_y']})\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"df = hvs.nyc_taxi(\"pandas\", engine_kwargs={\"columns\": ['dropoff_x', 'dropoff_y']})\n",
"df = hvs.nyc_taxi_remote(\"pandas\", engine_kwargs={\"columns\": ['dropoff_x', 'dropoff_y']})\n",

Also, mention that this is the first time this cell is run.

pixi.toml Outdated
setuptools = "*" # distutils for pyct
toolz = "*"
xarray = "*"
fastparquet = "*"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check that this has not indirectly pinned anything like pandas.

@codspeed-hq
Copy link

codspeed-hq bot commented Sep 26, 2025

CodSpeed Performance Report

Merging #1462 will improve performances by 49.12%

Comparing hvsampledata (78ac90f) with main (a5ae17c)

Summary

⚡ 4 improvements
✅ 39 untouched

Benchmarks breakdown

Mode Benchmark BASE HEAD Change
Simulation test_layout[forceatlas2_layout] 63.9 ms 56 ms +14.2%
Simulation test_quadmesh_raster[1024] 20.5 ms 18.1 ms +13.11%
Simulation test_quadmesh_raster[512] 16.9 ms 15.2 ms +11.4%
Simulation test_quadmesh_rectilinear[1024] 137 ms 91.9 ms +49.12%

@hoxbro hoxbro changed the title chore: replace direct download of nyc_taxi with hvsampledata chore!: replace direct download of nyc_taxi with hvsampledata Sep 27, 2025
Copy link
Member

@maximlt maximlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be first deprecated before being removed https://holoviz.org/about/heps/hep2.html.

@hoxbro
Copy link
Member

hoxbro commented Oct 9, 2025

I think this should be first deprecated before being removed holoviz.org/about/heps/hep2.html.

Are you okay with not having pyct as a required dependency? but still having the entrypoint and explaining that:

  1. the entrypoint is deprecated, but users need to install pyct for this to work and,
  2. maybe guide people to use hvsampledata if they want to download data?

This is also a little bit weird, because we don't break running code, but "only" the CLI.

@maximlt
Copy link
Member

maximlt commented Oct 10, 2025

The CLI is part of public interface so I think it deserves the same treatment as the API. I don't see an urgent reason to immediately remove pyct from the dependencies. I'm in favor of simply adding deprecation warnings, and removing the whole thing in a few releases.

from . import transfer_functions as tf # noqa (API import)
from . import data_libraries # noqa (API import)

with suppress(ImportError):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the right place it should be in main.py.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK!

'numpy',
'pandas',
'param',
'pyct',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think pyct should be removed yet from the dependencies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not removed totally. Just moved to optional dependencies.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With these changes if I install datashader and run datashader example it breaks. So that's a breaking change. The goal is to deprecate it so people get a chance to stop using datashader example. Then pyct can be removed entirely from the dependencies.

@codecov
Copy link

codecov bot commented Oct 15, 2025

Codecov Report

❌ Patch coverage is 50.00000% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.30%. Comparing base (a5ae17c) to head (78ac90f).

Files with missing lines Patch % Lines
datashader/__main__.py 0.00% 5 Missing ⚠️
datashader/__init__.py 69.23% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1462      +/-   ##
==========================================
- Coverage   88.34%   88.30%   -0.04%     
==========================================
  Files          96       96              
  Lines       18932    18946      +14     
==========================================
+ Hits        16725    16730       +5     
- Misses       2207     2216       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Azaya89 Azaya89 marked this pull request as ready for review October 15, 2025 17:32
@Azaya89
Copy link
Contributor Author

Azaya89 commented Oct 15, 2025

I added a temporary patch to allow the CI to pass pending a new hvsampledata release but otherwise I think it's ready for review now.

@Azaya89 Azaya89 requested review from Copilot, hoxbro and maximlt October 15, 2025 17:37

This comment was marked as spam.

@holoviz holoviz deleted a comment from Copilot AI Oct 15, 2025
pyproject.toml Outdated
# 2025-09
"ignore:Signature .* for <class 'numpy.longdouble'>.*:UserWarning",
# 2025-10
"ignore:The 'pyct' package bundled as a datashader dependency is deprecated since version 0.19 and will be removed in version 0.20.*" # https://github.com/holoviz/datashader/pull/1462
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove this filtering. Running datashader in Python (and not CLI) should never show this warning.

@hoxbro hoxbro added this to the v0.19.0 milestone Nov 13, 2025
"import hvsampledata as hvs\n",
"import hvsampledata as hvs, pandas as pd\n",
"\n",
"df = hvs.nyc_taxi_remote(\"pandas\", engine_kwargs={\"columns\": ['dropoff_x', 'dropoff_y']})\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us keep this.

@Azaya89 Azaya89 requested a review from hoxbro November 26, 2025 14:43
pixi.toml Outdated
setuptools = "*" # distutils for pyct
toolz = "*"
xarray = "*"
pyarrow = "*"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be added back to where it previously was.

Comment on lines 4 to 9
with suppress(ImportError):
import pyct.cmd
from datashader import _warn_pyct_deprecated

_warn_pyct_deprecated(stacklevel=1)
pyct.cmd.fetch_data(name="data", path="examples", datasets="datasets.yml")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scripts directory is for internal use only; please remove this.


def _warn_pyct_deprecated(stacklevel=2):
warnings.warn(
"The 'pyct' package is deprecated since version 0.19 "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user saw this warning, do you think they would understand it?

I will also move warnings import into the function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user saw this warning, do you think they would understand it?

Reads simple enough for me, yeah.

I will also move warnings import into the function.

OK.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me rephrase it: what does the pyct package do, and why is it affecting the deprecated functions? Without looking at the code, it seems very confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, OK I get it now. A user wouldn't have used pyct directly and would have been confused seeing the mention of it in the message. Will rephrase to mention the actual functions being called.

"""Wrapper to add deprecation warning to pyct functions."""
@wraps(func)
def wrapper(*args, **kwargs):
_warn_pyct_deprecated(stacklevel=2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if these stack levels are set correctly, but they are likely still emitted because the category is a FutureWarning.

You can test this out by setting the CategoryWarning to DeprecationWarning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. Thanks. The correct stacklevel is 3 in this instance. I tested it via

import datashader as ds

ds.fetch_data("datashader")

@Azaya89 Azaya89 requested a review from hoxbro November 27, 2025 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants