Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
db9e8a8
deprecate add_nested
dougbrn Sep 16, 2025
7da947f
nestedseries and accessor
dougbrn Sep 16, 2025
4e4da86
nesteddtype api changes
dougbrn Sep 16, 2025
aed3fb1
remove deprecated func use in tests and internals
dougbrn Sep 16, 2025
f70aa67
replace deprecated functions in docs
dougbrn Sep 16, 2025
e7178a0
use Deprecated package, replace add methods with set
dougbrn Sep 17, 2025
0c4c465
Merge pull request #360 from lincc-frameworks/v0_6_renaming
dougbrn Sep 17, 2025
bff6e00
add->set in accessor toc
dougbrn Sep 17, 2025
6fd9a7b
Merge pull request #364 from lincc-frameworks/fix_toc_names
dougbrn Sep 17, 2025
3d71d29
add mypy depdency
dougbrn Sep 17, 2025
5ceb47c
add min version for Deprecated
dougbrn Sep 17, 2025
982215c
min wrapt version
dougbrn Sep 17, 2025
8d87bbe
undo wrapt
dougbrn Sep 17, 2025
3902bab
initial map_rows implementation
dougbrn Sep 19, 2025
3140995
pre-commit fixes
dougbrn Sep 19, 2025
a6a72a6
reworked main reduce test; args->columns
dougbrn Sep 19, 2025
41e44ca
convert more tests
dougbrn Sep 19, 2025
e1b67ee
scrub test suite of reduce
dougbrn Sep 19, 2025
b94480a
reduce->map_rows in docs
dougbrn Sep 19, 2025
1c3f838
fix straggler reduce call
dougbrn Sep 19, 2025
15d5a3a
add both container examples
dougbrn Sep 19, 2025
5727657
extra comment for dict
dougbrn Sep 19, 2025
3b4170b
add args example to docstring
dougbrn Sep 22, 2025
fd25634
Apply suggestions from code review
dougbrn Sep 23, 2025
69a4f65
add more examples; new private helper funcs; more doc text
dougbrn Sep 23, 2025
2fb336d
fix docstring formatting
dougbrn Sep 23, 2025
878f4c8
Merge pull request #366 from lincc-frameworks/map_rows
dougbrn Sep 23, 2025
3cd6faa
fix f-string for 3.10
dougbrn Sep 23, 2025
d73d568
properly deprecate fields
dougbrn Sep 23, 2025
a6f4b5d
test coverage
dougbrn Sep 23, 2025
c9f0476
some ci tweaks
dougbrn Sep 23, 2025
d305c8e
remove nest_lists deprecation behavior
dougbrn Sep 23, 2025
3829027
remove unneccesary import
dougbrn Sep 23, 2025
edf94c2
test mypy fix
dougbrn Sep 23, 2025
7cb9ce9
wrapt version
dougbrn Sep 23, 2025
e903683
improve programmatic example
dougbrn Sep 25, 2025
6eb13cd
make get_subcolumns public
dougbrn Sep 29, 2025
d993845
make base_columns property public
dougbrn Sep 29, 2025
dd891ae
add visibility to new properties
dougbrn Sep 29, 2025
1e4b8a3
add more init paths for NestedDtype
dougbrn Sep 30, 2025
accc804
Merge pull request #374 from lincc-frameworks/dtype_init
dougbrn Sep 30, 2025
f74416e
update readme example
dougbrn Oct 1, 2025
bda5d05
better readme example
dougbrn Oct 1, 2025
2eb9f00
improve get_flat_series replacement
dougbrn Oct 2, 2025
d2e9880
cache a single to_lists call
dougbrn Oct 2, 2025
715e793
Merge branch 'main' into v0_6
dougbrn Oct 2, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,12 @@ Allowing powerful and straightforward operations, like:
```python
# Compute the mean flux for each row of "object_nf"
import numpy as np
object_nf.reduce(np.mean, "nested_sources.flux")

def mean_flux(row):
"""Calculates the mean flux for each object"""
return np.mean(row["nested_sources.flux"])

object_nf.map_rows(mean_flux, output_names="mean_flux")
```

<p align="center">
Expand Down
38 changes: 30 additions & 8 deletions docs/gettingstarted/quickstart.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -282,9 +282,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Reduce Function\n",
"## The `map_rows` Function\n",
"\n",
"Finally, we'll end with the flexible `reduce` function. `reduce` functions similarly to pandas' `apply` but flattens (reduces) the inputs from nested layers into array inputs to the given apply function. For example, let's find the mean flux for each dataframe in \"nested\":"
"Finally, we'll end with the flexible `map_rows` function. `map_rows` functions similarly to pandas' `apply` but applies row by row and flattens the inputs from nested layers into array inputs to the given apply function. For example, let's find the mean flux for each dataframe in \"nested\":"
]
},
{
Expand All @@ -297,7 +297,8 @@
"\n",
"# use hierarchical column names to access the flux column\n",
"# passed as an array to np.mean\n",
"nf.reduce(np.mean, \"lightcurve.brightness\")"
"# row_container signals how to pass the data to the function, in this case as direct arguments\n",
"nf.map_rows(np.mean, \"lightcurve.brightness\", row_container=\"args\")"
]
},
{
Expand All @@ -313,15 +314,15 @@
"metadata": {},
"outputs": [],
"source": [
"def show_inputs(*args):\n",
" return args"
"def show_inputs(row):\n",
" return row"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Applying some inputs via reduce, we see how it sends inputs to a given function. The output frame `nf_inputs` consists of two columns containing the output of the “ra” column and the “lightcurve.time” column."
"Applying some inputs via `map_rows`, we see how it sends inputs to a given function. The output frame `nf_inputs` consists of two columns containing the output of the “ra” column and the “lightcurve.time” column."
]
},
{
Expand All @@ -330,8 +331,12 @@
"metadata": {},
"outputs": [],
"source": [
"nf_inputs = nf.reduce(show_inputs, \"ra\", \"lightcurve.time\")\n",
"nf_inputs"
"# row_container=\"dict\" passes the data as a dictionary to the function\n",
"nf_inputs = nf.map_rows(show_inputs, columns=[\"ra\", \"lightcurve.time\"], row_container=\"dict\")\n",
"nf_inputs\n",
"\n",
"# map_rows returns a dataframe view of the dicts, but the two columns can be accessed with show_inputs as\n",
"# row[\"ra\"] and row[\"lightcurve.time\"]"
]
},
{
Expand All @@ -343,6 +348,23 @@
"nf_inputs.loc[0]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# row_container=\"args\" passes the data as arguments to the function\n",
"\n",
"\n",
"def show_inputs(*args):\n",
" return args\n",
"\n",
"\n",
"nf_inputs = nf.map_rows(show_inputs, columns=[\"ra\", \"lightcurve.time\"], row_container=\"args\")\n",
"nf_inputs"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down
4 changes: 2 additions & 2 deletions docs/pre_executed/nested_spectra.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -280,7 +280,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": null,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -452,7 +452,7 @@
}
],
"source": [
"spec_ndf = xid_ndf.add_nested(flat_spec, \"coadd_spectrum\").set_index(\"objid\")\n",
"spec_ndf = xid_ndf.join_nested(flat_spec, \"coadd_spectrum\").set_index(\"objid\")\n",
"spec_ndf"
]
},
Expand Down
2 changes: 1 addition & 1 deletion docs/pre_executed/performance.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@
"# Read in parquet data\n",
"# nesting sources into objects\n",
"nf = npd.read_parquet(\"objects.parquet\")\n",
"nf = nf.add_nested(npd.read_parquet(\"ztf_sources.parquet\"), \"ztf_sources\")\n",
"nf = nf.join_nested(npd.read_parquet(\"ztf_sources.parquet\"), \"ztf_sources\")\n",
"\n",
"# Filter on object\n",
"nf = nf.query(\"ra > 10.0\")\n",
Expand Down
15 changes: 6 additions & 9 deletions docs/reference/accessor.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,9 @@ Functions
NestSeriesAccessor.to_lists
NestSeriesAccessor.to_flat
NestSeriesAccessor.to_flatten_inner
NestSeriesAccessor.with_field
NestSeriesAccessor.with_flat_field
NestSeriesAccessor.with_list_field
NestSeriesAccessor.with_filled_field
NestSeriesAccessor.without_field
NestSeriesAccessor.query_flat
NestSeriesAccessor.get_flat_index
NestSeriesAccessor.get_flat_series
NestSeriesAccessor.get_list_series
NestSeriesAccessor.set_column
NestSeriesAccessor.set_flat_column
NestSeriesAccessor.set_list_column
NestSeriesAccessor.set_filled_column
NestSeriesAccessor.drop
NestSeriesAccessor.query
2 changes: 1 addition & 1 deletion docs/reference/nesteddtype.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,6 @@ Functions

NestedDtype.construct_array_type
NestedDtype.construct_from_string
NestedDtype.from_fields
NestedDtype.from_columns
NestedDtype.from_pandas_arrow_dtype
NestedDtype.to_pandas_arrow_dtype
17 changes: 14 additions & 3 deletions docs/reference/nestedframe.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,21 @@ Constructor

NestedFrame

Helpful Properties
~~~~~~~~~~~~~~~~~~
.. autosummary::
:toctree: api/

NestedFrame.nested_columns
NestedFrame.base_columns
NestedFrame.all_columns

Nesting
~~~~~~~~~
.. autosummary::
:toctree: api/

NestedFrame.add_nested
NestedFrame.join_nested
NestedFrame.nest_lists
NestedFrame.from_flat
NestedFrame.from_lists
Expand All @@ -25,19 +34,21 @@ Extended Pandas.DataFrame Interface

.. note::
The NestedFrame extends the Pandas.DataFrame interface, so all methods
of Pandas.DataFrame are available. The following methods are extended
of Pandas.DataFrame are available. The following methods are a mix of
newly added methods and extended methods from Pandas DataFrame
to support NestedFrame functionality. Please reference the Pandas
documentation for more information.
https://pandas.pydata.org/docs/reference/frame.html

.. autosummary::
:toctree: api/

NestedFrame.get_subcolumns
NestedFrame.eval
NestedFrame.query
NestedFrame.dropna
NestedFrame.sort_values
NestedFrame.reduce
NestedFrame.map_rows
NestedFrame.drop
NestedFrame.min
NestedFrame.max
Expand Down
2 changes: 1 addition & 1 deletion docs/reference/nestedseries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,4 @@ Functions
:toctree: api/

NestedSeries.to_lists
NestedSeries.to_flat
NestedSeries.explode
8 changes: 4 additions & 4 deletions docs/tutorials/data_loading_notebook.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We can then create an additional pandas dataframes for the nested columns and pack them into our `NestedFrame` with `NestedFrame.add_nested()` function. `add_nested` will align the nest based on the index by default (a column may be selected instead via the `on` kwarg), as we see the `nested` `DataFrame` has a repeated index corresponding to the `nf` `NestedFrame`."
"We can then create an additional pandas dataframes for the nested columns and pack them into our `NestedFrame` with `NestedFrame.join_nested()` function. `join_nested` will align the nest based on the index by default (a column may be selected instead via the `on` kwarg), as we see the `nested` `DataFrame` has a repeated index corresponding to the `nf` `NestedFrame`."
]
},
{
Expand All @@ -158,7 +158,7 @@
" index=[0, 0, 0, 1, 1, 1, 2, 2, 2, 2],\n",
")\n",
"\n",
"nf = nf.add_nested(nested, \"nested\")\n",
"nf = nf.join_nested(nested, \"nested\")\n",
"nf"
]
},
Expand All @@ -182,7 +182,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We could add other nested columns by creating new sub-tables and adding them with `add_nested()`. Note that while the tables added with each `add_nested()` must be rectangular, they do not need to have the same dimensions between calls. We could add another nested row with a different number of observations."
"We could add other nested columns by creating new sub-tables and adding them with `join_nested()`. Note that while the tables added with each `join_nested()` must be rectangular, they do not need to have the same dimensions between calls. We could add another nested row with a different number of observations."
]
},
{
Expand All @@ -199,7 +199,7 @@
" index=[0, 0, 1, 1, 1, 2],\n",
")\n",
"\n",
"nf = nf.add_nested(nested, \"nested2\")\n",
"nf = nf.join_nested(nested, \"nested2\")\n",
"nf"
]
},
Expand Down
15 changes: 4 additions & 11 deletions docs/tutorials/data_manipulation.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -105,13 +105,6 @@
"## Adding or Replacing Nested Columns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> *A Note on Performance: These operations involve full reconstruction of the nested columns so expect impacted performance when doing this at scale. It may be appropriate to do these operations within reduce functions directly (e.g. subtracting a value from a column) if performance is key.*"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -210,7 +203,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"This is functionally equivalent to using `add_nested`:"
"This is functionally equivalent to using `join_nested`:"
]
},
{
Expand All @@ -224,7 +217,7 @@
},
"outputs": [],
"source": [
"ndf.add_nested(ndf[\"nested.band\"].to_frame(), \"bands_from_add_nested\")"
"ndf.join_nested(ndf[\"nested.band\"].to_frame(), \"bands_from_add_nested\")"
]
},
{
Expand Down Expand Up @@ -254,7 +247,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The above again being shorthand for the following `add_nested` call:"
"The above again being shorthand for the following `join_nested` call:"
]
},
{
Expand All @@ -263,7 +256,7 @@
"metadata": {},
"outputs": [],
"source": [
"ndf.add_nested(flat_df, \"example_from_add_nested\")"
"ndf.join_nested(flat_df, \"example_from_add_nested\")"
]
},
{
Expand Down
22 changes: 11 additions & 11 deletions docs/tutorials/low_level.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@
"id": "33d8caacf0bf042e",
"metadata": {},
"source": [
"You can also get a list of fields with `.fields` attribute:"
"You can also get a list of columns with the `.columns` attribute:"
]
},
{
Expand All @@ -126,7 +126,7 @@
},
"outputs": [],
"source": [
"nested_series.nest.fields"
"nested_series.nest.columns"
]
},
{
Expand Down Expand Up @@ -205,23 +205,23 @@
"new_series.nest[\"flux\"] = new_series.nest[\"flux\"] - new_series.nest[\"flux\"].mean()\n",
"\n",
"# Create a new series with a new column\n",
"new_series = new_series.nest.with_field(\"lsst_band\", \"lsst_\" + new_series.nest[\"band\"])\n",
"new_series = new_series.nest.set_column(\"lsst_band\", \"lsst_\" + new_series.nest[\"band\"])\n",
"\n",
"# Create a new series with a column removed, you can also pass a list of columns to remove\n",
"new_series = new_series.nest.without_field(\"band\")\n",
"new_series = new_series.nest.drop(\"band\")\n",
"\n",
"# Add a new column with a python list instead of a Series\n",
"new_series = new_series.nest.with_field(\n",
"new_series = new_series.nest.set_column(\n",
" \"new_column\",\n",
" [1, 2] * (new_series.nest.flat_length // 2),\n",
")\n",
"\n",
"# Add a new column repeating values for each nested element\n",
"# It can be useful when you want to move some metadata to the nested data\n",
"new_series = new_series.nest.with_filled_field(\"index_mult_100\", new_series.index * 100)\n",
"new_series = new_series.nest.set_filled_column(\"index_mult_100\", new_series.index * 100)\n",
"\n",
"# Create a new series, with a column dtype changed\n",
"new_series = new_series.nest.with_field(\"t\", new_series.nest[\"t\"].astype(np.int8))\n",
"new_series = new_series.nest.set_column(\"t\", new_series.nest[\"t\"].astype(np.int8))\n",
"\n",
"new_series.nest.to_flat()"
]
Expand Down Expand Up @@ -293,7 +293,7 @@
"source": [
"# Adjust each time to be relative to the first observation\n",
"dt = new_series.nest.to_lists()[\"t\"].apply(lambda t: t - t.min())\n",
"new_series = new_series.nest.with_list_field(\"dt\", dt)\n",
"new_series = new_series.nest.set_list_column(\"dt\", dt)\n",
"new_series.nest.to_flat()"
]
},
Expand Down Expand Up @@ -367,7 +367,7 @@
"We have already seen how `.nest` accessor could be used to get different views on the nested data: \"flat\" dataframe, and list-array dataframe with columns of `pd.ArrowDtype`.\n",
"\n",
"This section is about converting nested Series to and from other data types.\n",
"If you just need to add a nested column to a `NestedFrame`, you can do it with `.add_nested()` method."
"If you just need to add a nested column to a `NestedFrame`, you can do it with `.join_nested()` method."
]
},
{
Expand Down Expand Up @@ -542,7 +542,7 @@
" {\"t\": [4, 5], \"flux\": [0.4, 0.5]},\n",
" None,\n",
" ],\n",
" dtype=NestedDtype.from_fields({\"t\": pa.float64(), \"flux\": pa.float32()}),\n",
" dtype=NestedDtype.from_columns({\"t\": pa.float64(), \"flux\": pa.float32()}),\n",
")\n",
"series_from_pack"
]
Expand Down Expand Up @@ -588,7 +588,7 @@
" pd.DataFrame({\"t\": [1, 2, 3], \"band\": [\"g\", \"r\", \"r\"]}),\n",
" {\"t\": np.array([4, 5]), \"band\": [None, \"r\"]},\n",
" ],\n",
" dtype=NestedDtype.from_fields({\"t\": pa.float64(), \"band\": pa.string()}),\n",
" dtype=NestedDtype.from_columns({\"t\": pa.float64(), \"band\": pa.string()}),\n",
")\n",
"series_from_dtype"
]
Expand Down
5 changes: 4 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,9 @@ dependencies = [
# We use internal pd._libs.missing and experimental ArrowExtensionArray
"pandas>=2.2.3,<2.4",
"pyarrow>=16", # remove struct_field_names() and struct_fields() when upgraded to 18+

"Deprecated>=1.2.0",
"wrapt>=1.12.1",

# NOTE: package PINNED at <0.3.0, see https://github.com/astronomy-commons/lsdb/issues/1047
"universal_pathlib>=0.2,<0.3.0",
]
Expand All @@ -43,6 +45,7 @@ dev = [
"aiohttp",
"requests",
"s3fs",
"types-Deprecated", # Needed for mypy type checking of Deprecated package
]

[build-system]
Expand Down
Loading