Skip to content

Commit cf288d7

Browse files
authored
Merge pull request #363 from lincc-frameworks/v0_6
Nested-Pandas V0.6
2 parents bd8393e + 715e793 commit cf288d7

28 files changed

+1389
-497
lines changed

README.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,12 @@ Allowing powerful and straightforward operations, like:
4646
```python
4747
# Compute the mean flux for each row of "object_nf"
4848
import numpy as np
49-
object_nf.reduce(np.mean, "nested_sources.flux")
49+
50+
def mean_flux(row):
51+
"""Calculates the mean flux for each object"""
52+
return np.mean(row["nested_sources.flux"])
53+
54+
object_nf.map_rows(mean_flux, output_names="mean_flux")
5055
```
5156

5257
<p align="center">

docs/gettingstarted/quickstart.ipynb

Lines changed: 30 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -282,9 +282,9 @@
282282
"cell_type": "markdown",
283283
"metadata": {},
284284
"source": [
285-
"## Reduce Function\n",
285+
"## The `map_rows` Function\n",
286286
"\n",
287-
"Finally, we'll end with the flexible `reduce` function. `reduce` functions similarly to pandas' `apply` but flattens (reduces) the inputs from nested layers into array inputs to the given apply function. For example, let's find the mean flux for each dataframe in \"nested\":"
287+
"Finally, we'll end with the flexible `map_rows` function. `map_rows` functions similarly to pandas' `apply` but applies row by row and flattens the inputs from nested layers into array inputs to the given apply function. For example, let's find the mean flux for each dataframe in \"nested\":"
288288
]
289289
},
290290
{
@@ -297,7 +297,8 @@
297297
"\n",
298298
"# use hierarchical column names to access the flux column\n",
299299
"# passed as an array to np.mean\n",
300-
"nf.reduce(np.mean, \"lightcurve.brightness\")"
300+
"# row_container signals how to pass the data to the function, in this case as direct arguments\n",
301+
"nf.map_rows(np.mean, \"lightcurve.brightness\", row_container=\"args\")"
301302
]
302303
},
303304
{
@@ -313,15 +314,15 @@
313314
"metadata": {},
314315
"outputs": [],
315316
"source": [
316-
"def show_inputs(*args):\n",
317-
" return args"
317+
"def show_inputs(row):\n",
318+
" return row"
318319
]
319320
},
320321
{
321322
"cell_type": "markdown",
322323
"metadata": {},
323324
"source": [
324-
"Applying some inputs via reduce, we see how it sends inputs to a given function. The output frame `nf_inputs` consists of two columns containing the output of the “ra” column and the “lightcurve.time” column."
325+
"Applying some inputs via `map_rows`, we see how it sends inputs to a given function. The output frame `nf_inputs` consists of two columns containing the output of the “ra” column and the “lightcurve.time” column."
325326
]
326327
},
327328
{
@@ -330,8 +331,12 @@
330331
"metadata": {},
331332
"outputs": [],
332333
"source": [
333-
"nf_inputs = nf.reduce(show_inputs, \"ra\", \"lightcurve.time\")\n",
334-
"nf_inputs"
334+
"# row_container=\"dict\" passes the data as a dictionary to the function\n",
335+
"nf_inputs = nf.map_rows(show_inputs, columns=[\"ra\", \"lightcurve.time\"], row_container=\"dict\")\n",
336+
"nf_inputs\n",
337+
"\n",
338+
"# map_rows returns a dataframe view of the dicts, but the two columns can be accessed with show_inputs as\n",
339+
"# row[\"ra\"] and row[\"lightcurve.time\"]"
335340
]
336341
},
337342
{
@@ -343,6 +348,23 @@
343348
"nf_inputs.loc[0]"
344349
]
345350
},
351+
{
352+
"cell_type": "code",
353+
"execution_count": null,
354+
"metadata": {},
355+
"outputs": [],
356+
"source": [
357+
"# row_container=\"args\" passes the data as arguments to the function\n",
358+
"\n",
359+
"\n",
360+
"def show_inputs(*args):\n",
361+
" return args\n",
362+
"\n",
363+
"\n",
364+
"nf_inputs = nf.map_rows(show_inputs, columns=[\"ra\", \"lightcurve.time\"], row_container=\"args\")\n",
365+
"nf_inputs"
366+
]
367+
},
346368
{
347369
"cell_type": "markdown",
348370
"metadata": {},

docs/pre_executed/nested_spectra.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -280,7 +280,7 @@
280280
},
281281
{
282282
"cell_type": "code",
283-
"execution_count": 8,
283+
"execution_count": null,
284284
"metadata": {},
285285
"outputs": [
286286
{
@@ -452,7 +452,7 @@
452452
}
453453
],
454454
"source": [
455-
"spec_ndf = xid_ndf.add_nested(flat_spec, \"coadd_spectrum\").set_index(\"objid\")\n",
455+
"spec_ndf = xid_ndf.join_nested(flat_spec, \"coadd_spectrum\").set_index(\"objid\")\n",
456456
"spec_ndf"
457457
]
458458
},

docs/pre_executed/performance.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@
9898
"# Read in parquet data\n",
9999
"# nesting sources into objects\n",
100100
"nf = npd.read_parquet(\"objects.parquet\")\n",
101-
"nf = nf.add_nested(npd.read_parquet(\"ztf_sources.parquet\"), \"ztf_sources\")\n",
101+
"nf = nf.join_nested(npd.read_parquet(\"ztf_sources.parquet\"), \"ztf_sources\")\n",
102102
"\n",
103103
"# Filter on object\n",
104104
"nf = nf.query(\"ra > 10.0\")\n",

docs/reference/accessor.rst

Lines changed: 6 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -18,12 +18,9 @@ Functions
1818
NestSeriesAccessor.to_lists
1919
NestSeriesAccessor.to_flat
2020
NestSeriesAccessor.to_flatten_inner
21-
NestSeriesAccessor.with_field
22-
NestSeriesAccessor.with_flat_field
23-
NestSeriesAccessor.with_list_field
24-
NestSeriesAccessor.with_filled_field
25-
NestSeriesAccessor.without_field
26-
NestSeriesAccessor.query_flat
27-
NestSeriesAccessor.get_flat_index
28-
NestSeriesAccessor.get_flat_series
29-
NestSeriesAccessor.get_list_series
21+
NestSeriesAccessor.set_column
22+
NestSeriesAccessor.set_flat_column
23+
NestSeriesAccessor.set_list_column
24+
NestSeriesAccessor.set_filled_column
25+
NestSeriesAccessor.drop
26+
NestSeriesAccessor.query

docs/reference/nesteddtype.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,6 @@ Functions
1717

1818
NestedDtype.construct_array_type
1919
NestedDtype.construct_from_string
20-
NestedDtype.from_fields
20+
NestedDtype.from_columns
2121
NestedDtype.from_pandas_arrow_dtype
2222
NestedDtype.to_pandas_arrow_dtype

docs/reference/nestedframe.rst

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,12 +10,21 @@ Constructor
1010

1111
NestedFrame
1212

13+
Helpful Properties
14+
~~~~~~~~~~~~~~~~~~
15+
.. autosummary::
16+
:toctree: api/
17+
18+
NestedFrame.nested_columns
19+
NestedFrame.base_columns
20+
NestedFrame.all_columns
21+
1322
Nesting
1423
~~~~~~~~~
1524
.. autosummary::
1625
:toctree: api/
1726

18-
NestedFrame.add_nested
27+
NestedFrame.join_nested
1928
NestedFrame.nest_lists
2029
NestedFrame.from_flat
2130
NestedFrame.from_lists
@@ -25,19 +34,21 @@ Extended Pandas.DataFrame Interface
2534

2635
.. note::
2736
The NestedFrame extends the Pandas.DataFrame interface, so all methods
28-
of Pandas.DataFrame are available. The following methods are extended
37+
of Pandas.DataFrame are available. The following methods are a mix of
38+
newly added methods and extended methods from Pandas DataFrame
2939
to support NestedFrame functionality. Please reference the Pandas
3040
documentation for more information.
3141
https://pandas.pydata.org/docs/reference/frame.html
3242

3343
.. autosummary::
3444
:toctree: api/
3545

46+
NestedFrame.get_subcolumns
3647
NestedFrame.eval
3748
NestedFrame.query
3849
NestedFrame.dropna
3950
NestedFrame.sort_values
40-
NestedFrame.reduce
51+
NestedFrame.map_rows
4152
NestedFrame.drop
4253
NestedFrame.min
4354
NestedFrame.max

docs/reference/nestedseries.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,4 +16,4 @@ Functions
1616
:toctree: api/
1717

1818
NestedSeries.to_lists
19-
NestedSeries.to_flat
19+
NestedSeries.explode

docs/tutorials/data_loading_notebook.ipynb

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -141,7 +141,7 @@
141141
"cell_type": "markdown",
142142
"metadata": {},
143143
"source": [
144-
"We can then create an additional pandas dataframes for the nested columns and pack them into our `NestedFrame` with `NestedFrame.add_nested()` function. `add_nested` will align the nest based on the index by default (a column may be selected instead via the `on` kwarg), as we see the `nested` `DataFrame` has a repeated index corresponding to the `nf` `NestedFrame`."
144+
"We can then create an additional pandas dataframes for the nested columns and pack them into our `NestedFrame` with `NestedFrame.join_nested()` function. `join_nested` will align the nest based on the index by default (a column may be selected instead via the `on` kwarg), as we see the `nested` `DataFrame` has a repeated index corresponding to the `nf` `NestedFrame`."
145145
]
146146
},
147147
{
@@ -158,7 +158,7 @@
158158
" index=[0, 0, 0, 1, 1, 1, 2, 2, 2, 2],\n",
159159
")\n",
160160
"\n",
161-
"nf = nf.add_nested(nested, \"nested\")\n",
161+
"nf = nf.join_nested(nested, \"nested\")\n",
162162
"nf"
163163
]
164164
},
@@ -182,7 +182,7 @@
182182
"cell_type": "markdown",
183183
"metadata": {},
184184
"source": [
185-
"We could add other nested columns by creating new sub-tables and adding them with `add_nested()`. Note that while the tables added with each `add_nested()` must be rectangular, they do not need to have the same dimensions between calls. We could add another nested row with a different number of observations."
185+
"We could add other nested columns by creating new sub-tables and adding them with `join_nested()`. Note that while the tables added with each `join_nested()` must be rectangular, they do not need to have the same dimensions between calls. We could add another nested row with a different number of observations."
186186
]
187187
},
188188
{
@@ -199,7 +199,7 @@
199199
" index=[0, 0, 1, 1, 1, 2],\n",
200200
")\n",
201201
"\n",
202-
"nf = nf.add_nested(nested, \"nested2\")\n",
202+
"nf = nf.join_nested(nested, \"nested2\")\n",
203203
"nf"
204204
]
205205
},

docs/tutorials/data_manipulation.ipynb

Lines changed: 4 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -105,13 +105,6 @@
105105
"## Adding or Replacing Nested Columns"
106106
]
107107
},
108-
{
109-
"cell_type": "markdown",
110-
"metadata": {},
111-
"source": [
112-
"> *A Note on Performance: These operations involve full reconstruction of the nested columns so expect impacted performance when doing this at scale. It may be appropriate to do these operations within reduce functions directly (e.g. subtracting a value from a column) if performance is key.*"
113-
]
114-
},
115108
{
116109
"cell_type": "markdown",
117110
"metadata": {},
@@ -210,7 +203,7 @@
210203
"cell_type": "markdown",
211204
"metadata": {},
212205
"source": [
213-
"This is functionally equivalent to using `add_nested`:"
206+
"This is functionally equivalent to using `join_nested`:"
214207
]
215208
},
216209
{
@@ -224,7 +217,7 @@
224217
},
225218
"outputs": [],
226219
"source": [
227-
"ndf.add_nested(ndf[\"nested.band\"].to_frame(), \"bands_from_add_nested\")"
220+
"ndf.join_nested(ndf[\"nested.band\"].to_frame(), \"bands_from_add_nested\")"
228221
]
229222
},
230223
{
@@ -254,7 +247,7 @@
254247
"cell_type": "markdown",
255248
"metadata": {},
256249
"source": [
257-
"The above again being shorthand for the following `add_nested` call:"
250+
"The above again being shorthand for the following `join_nested` call:"
258251
]
259252
},
260253
{
@@ -263,7 +256,7 @@
263256
"metadata": {},
264257
"outputs": [],
265258
"source": [
266-
"ndf.add_nested(flat_df, \"example_from_add_nested\")"
259+
"ndf.join_nested(flat_df, \"example_from_add_nested\")"
267260
]
268261
},
269262
{

0 commit comments

Comments
 (0)