Skip to content

Commit 878f4c8

Browse files
authored
Merge pull request #366 from lincc-frameworks/map_rows
map_rows implementation
2 parents 8d87bbe + 2fb336d commit 878f4c8

File tree

7 files changed

+387
-57
lines changed

7 files changed

+387
-57
lines changed

docs/gettingstarted/quickstart.ipynb

Lines changed: 30 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -282,9 +282,9 @@
282282
"cell_type": "markdown",
283283
"metadata": {},
284284
"source": [
285-
"## Reduce Function\n",
285+
"## The `map_rows` Function\n",
286286
"\n",
287-
"Finally, we'll end with the flexible `reduce` function. `reduce` functions similarly to pandas' `apply` but flattens (reduces) the inputs from nested layers into array inputs to the given apply function. For example, let's find the mean flux for each dataframe in \"nested\":"
287+
"Finally, we'll end with the flexible `map_rows` function. `map_rows` functions similarly to pandas' `apply` but applies row by row and flattens the inputs from nested layers into array inputs to the given apply function. For example, let's find the mean flux for each dataframe in \"nested\":"
288288
]
289289
},
290290
{
@@ -297,7 +297,8 @@
297297
"\n",
298298
"# use hierarchical column names to access the flux column\n",
299299
"# passed as an array to np.mean\n",
300-
"nf.reduce(np.mean, \"lightcurve.brightness\")"
300+
"# row_container signals how to pass the data to the function, in this case as direct arguments\n",
301+
"nf.map_rows(np.mean, \"lightcurve.brightness\", row_container=\"args\")"
301302
]
302303
},
303304
{
@@ -313,15 +314,15 @@
313314
"metadata": {},
314315
"outputs": [],
315316
"source": [
316-
"def show_inputs(*args):\n",
317-
" return args"
317+
"def show_inputs(row):\n",
318+
" return row"
318319
]
319320
},
320321
{
321322
"cell_type": "markdown",
322323
"metadata": {},
323324
"source": [
324-
"Applying some inputs via reduce, we see how it sends inputs to a given function. The output frame `nf_inputs` consists of two columns containing the output of the “ra” column and the “lightcurve.time” column."
325+
"Applying some inputs via `map_rows`, we see how it sends inputs to a given function. The output frame `nf_inputs` consists of two columns containing the output of the “ra” column and the “lightcurve.time” column."
325326
]
326327
},
327328
{
@@ -330,8 +331,12 @@
330331
"metadata": {},
331332
"outputs": [],
332333
"source": [
333-
"nf_inputs = nf.reduce(show_inputs, \"ra\", \"lightcurve.time\")\n",
334-
"nf_inputs"
334+
"# row_container=\"dict\" passes the data as a dictionary to the function\n",
335+
"nf_inputs = nf.map_rows(show_inputs, columns=[\"ra\", \"lightcurve.time\"], row_container=\"dict\")\n",
336+
"nf_inputs\n",
337+
"\n",
338+
"# map_rows returns a dataframe view of the dicts, but the two columns can be accessed with show_inputs as\n",
339+
"# row[\"ra\"] and row[\"lightcurve.time\"]"
335340
]
336341
},
337342
{
@@ -343,6 +348,23 @@
343348
"nf_inputs.loc[0]"
344349
]
345350
},
351+
{
352+
"cell_type": "code",
353+
"execution_count": null,
354+
"metadata": {},
355+
"outputs": [],
356+
"source": [
357+
"# row_container=\"args\" passes the data as arguments to the function\n",
358+
"\n",
359+
"\n",
360+
"def show_inputs(*args):\n",
361+
" return args\n",
362+
"\n",
363+
"\n",
364+
"nf_inputs = nf.map_rows(show_inputs, columns=[\"ra\", \"lightcurve.time\"], row_container=\"args\")\n",
365+
"nf_inputs"
366+
]
367+
},
346368
{
347369
"cell_type": "markdown",
348370
"metadata": {},

docs/reference/nestedframe.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ Extended Pandas.DataFrame Interface
3737
NestedFrame.query
3838
NestedFrame.dropna
3939
NestedFrame.sort_values
40-
NestedFrame.reduce
40+
NestedFrame.map_rows
4141
NestedFrame.drop
4242
NestedFrame.min
4343
NestedFrame.max

docs/tutorials/data_manipulation.ipynb

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -105,13 +105,6 @@
105105
"## Adding or Replacing Nested Columns"
106106
]
107107
},
108-
{
109-
"cell_type": "markdown",
110-
"metadata": {},
111-
"source": [
112-
"> *A Note on Performance: These operations involve full reconstruction of the nested columns so expect impacted performance when doing this at scale. It may be appropriate to do these operations within reduce functions directly (e.g. subtracting a value from a column) if performance is key.*"
113-
]
114-
},
115108
{
116109
"cell_type": "markdown",
117110
"metadata": {},

0 commit comments

Comments
 (0)