From faf11c4a1116a2174037e46d27181cb0ee90bafb Mon Sep 17 00:00:00 2001 From: jrycw Date: Thu, 23 Jan 2025 15:52:10 +0800 Subject: [PATCH 1/6] Add blog post promoting the `mask=` parameter in `loc.body()` --- docs/blog/locbody-mask/index.qmd | 92 ++++++++++++++++++++++++++++++++ 1 file changed, 92 insertions(+) create mode 100644 docs/blog/locbody-mask/index.qmd diff --git a/docs/blog/locbody-mask/index.qmd b/docs/blog/locbody-mask/index.qmd new file mode 100644 index 000000000..4526c4ddd --- /dev/null +++ b/docs/blog/locbody-mask/index.qmd @@ -0,0 +1,92 @@ +--- +title: "Style Table Body with `mask=` in `loc.body()`" +html-table-processing: none +author: Rich Iannone, Michael Chow and Jerry Wu +date: 2025-01-23 +freeze: true +jupyter: python3 +format: + html: + code-summary: "Show the Code" +--- + +In Great Tables `0.16.0`, we introduced the `mask=` parameter in `loc.body()`, enabling users to apply conditional formatting to rows on a per-column basis more efficiently when working with a Polars DataFrame. This post demonstrates three approaches to styling the table body, so you can compare methods and choose the one that best fits your needs: + +* **Using a for-loop:** Repeatedly call `GT.tab_style()` for each column. +* **Utilizing the `locations=` parameter in `GT.tab_style()`:** Pass a list of `loc.body()` objects. +* **Leveraging the `mask=` parameter in `loc.body()`:** Use Polars expressions for streamlined styling. + +Let’s dive in. + +### Preparations +We'll use the built-in dataset `gtcars` to create a Polars DataFrame. Next, we'll select the columns `mfr`, `drivetrain`, `year`, and `hp` to create a small pivoted table named `df_mini`. Finally, we'll pass `df_mini` to the `GT` object to create a table named `gt`, using `drivetrain` as the `rowname_col` and `mfr` as the `groupname_col`, as shown below: +```{python} +# | code-fold: true +import polars as pl +from great_tables import GT, loc, style +from great_tables.data import gtcars +from polars import selectors as cs + +year_cols = ["2014.0", "2015.0", "2016.0", "2017.0"] +df_mini = ( + pl.from_pandas(gtcars) + .filter(pl.col("mfr").is_in(["Ferrari", "Lamborghini", "BMW"])) + .sort("drivetrain") + .pivot(on="year", index=["mfr", "drivetrain"], values="hp", aggregate_function="mean") + .select(["mfr", "drivetrain", *year_cols]) +) + +gt = GT(df_mini, rowname_col="drivetrain", groupname_col="mfr") +gt +``` + +The numbers in the cells represent the average horsepower for each combination of `mfr` and `drivetrain` for a specific year. + +In the following section, we'll demonstrate three different ways to highlight the cell text in red if the average horsepower exceeds 650. + +### Using a for-loop: Repeatedly call `GT.tab_style()` for each column +The most intuitive way is to call `GT.tab_style()` for each column. Here's how: +```{python} +gt1 = gt # <1> +for col in year_cols: + gt1 = gt1.tab_style( + style=style.text(color="red"), + locations=loc.body(columns=col, rows=pl.col(col).gt(650)) + ) +gt1 +``` +1. Since we want to keep `gt` intact for later use, we will modify `gt1` in this approach instead. + + +### Utilizing the `locations=` parameter in `GT.tab_style()`: Pass a list of `loc.body()` objects +A more concise method is to pass a list of `loc.body()` objects to the `locations=` parameter in `GT.tab_style()`, as shown below: +```{python} +( + gt.tab_style( + style=style.text(color="red"), + locations=[ + loc.body(columns=col, rows=pl.col(col).gt(650)) + for col in year_cols + ], + ) +) +``` + + +### Leveraging the `mask=` parameter in `loc.body()`: Use Polars expressions for streamlined styling +The most modern approach is to pass a Polars expression to the `mask=` parameter in `loc.body()`, as shown below: +```{python} +( + gt.tab_style( + style=style.text(color="red"), + locations=loc.body(mask=cs.numeric().gt(650)) + ) +) +``` + +In this example, `loc.body()` is smart enough to automatically target the rows where the cell value exceeds 650 for each numerical column. In general, you can think of `mask=` as a syntactic sugar that Great Tables provides to save you from having to manually loop through the columns. + +### Wrapping up +We extend our gratitude to [@igorcalabria](https://github.com/igorcalabria) for suggesting this feature in [#389](https://github.com/posit-dev/great-tables/issues/389) and providing an insightful explanation of its utility. A special thanks to [@henryharbeck](https://github.com/henryharbeck) for providing the second approach. + +We hope you enjoy this new functionality as much as we do! Have ideas to make Great Tables even better? Share them with us via [GitHub Issues](https://github.com/posit-dev/great-tables/issues). We're always amazed by the creativity of our users! See you, until the next great table. From acca7ef32827fe91778d263bd1321dc76f98f699 Mon Sep 17 00:00:00 2001 From: jrycw Date: Fri, 24 Jan 2025 14:39:54 +0800 Subject: [PATCH 2/6] Re-organize the post based on feedback --- docs/blog/locbody-mask/index.qmd | 69 +++++++++++++++++++------------- 1 file changed, 41 insertions(+), 28 deletions(-) diff --git a/docs/blog/locbody-mask/index.qmd b/docs/blog/locbody-mask/index.qmd index 4526c4ddd..d141c5c48 100644 --- a/docs/blog/locbody-mask/index.qmd +++ b/docs/blog/locbody-mask/index.qmd @@ -2,7 +2,7 @@ title: "Style Table Body with `mask=` in `loc.body()`" html-table-processing: none author: Rich Iannone, Michael Chow and Jerry Wu -date: 2025-01-23 +date: 2025-01-24 freeze: true jupyter: python3 format: @@ -10,11 +10,10 @@ format: code-summary: "Show the Code" --- -In Great Tables `0.16.0`, we introduced the `mask=` parameter in `loc.body()`, enabling users to apply conditional formatting to rows on a per-column basis more efficiently when working with a Polars DataFrame. This post demonstrates three approaches to styling the table body, so you can compare methods and choose the one that best fits your needs: +In Great Tables `0.16.0`, we introduced the `mask=` parameter in `loc.body()`, enabling users to apply conditional formatting to rows on a per-column basis more efficiently when working with a Polars DataFrame. This post will demonstrate how it works and compare it with the "old-fashioned" approach: -* **Using a for-loop:** Repeatedly call `GT.tab_style()` for each column. -* **Utilizing the `locations=` parameter in `GT.tab_style()`:** Pass a list of `loc.body()` objects. * **Leveraging the `mask=` parameter in `loc.body()`:** Use Polars expressions for streamlined styling. +* **Utilizing the `locations=` parameter in `GT.tab_style()`:** Pass a list of `loc.body()` objects. Let’s dive in. @@ -36,55 +35,69 @@ df_mini = ( .select(["mfr", "drivetrain", *year_cols]) ) -gt = GT(df_mini, rowname_col="drivetrain", groupname_col="mfr") +gt = GT(df_mini).tab_stub(rowname_col="drivetrain", groupname_col="mfr").opt_stylize(color="cyan") gt ``` The numbers in the cells represent the average horsepower for each combination of `mfr` and `drivetrain` for a specific year. -In the following section, we'll demonstrate three different ways to highlight the cell text in red if the average horsepower exceeds 650. +### Leveraging the `mask=` parameter in `loc.body()` +The `mask=` parameter in `loc.body()` accepts a Polars expression that evaluates to a boolean result for each cell. -### Using a for-loop: Repeatedly call `GT.tab_style()` for each column -The most intuitive way is to call `GT.tab_style()` for each column. Here's how: -```{python} -gt1 = gt # <1> -for col in year_cols: - gt1 = gt1.tab_style( - style=style.text(color="red"), - locations=loc.body(columns=col, rows=pl.col(col).gt(650)) - ) -gt1 -``` -1. Since we want to keep `gt` intact for later use, we will modify `gt1` in this approach instead. +Here’s how we can use it to achieve the two goals: +* Highlight the cell text in red if the column datatype is numerical and the cell value exceeds 650. +* Fill the background color as black if the cell value is missing in the last two columns (`2016.0` and `2017.0`). -### Utilizing the `locations=` parameter in `GT.tab_style()`: Pass a list of `loc.body()` objects -A more concise method is to pass a list of `loc.body()` objects to the `locations=` parameter in `GT.tab_style()`, as shown below: ```{python} ( gt.tab_style( style=style.text(color="red"), - locations=[ - loc.body(columns=col, rows=pl.col(col).gt(650)) - for col in year_cols - ], + locations=loc.body(mask=cs.numeric().gt(650)) + ).tab_style( + style=style.fill(color="black"), + locations=loc.body(mask=pl.nth(-2, -1).is_null()), ) ) ``` +In this example: -### Leveraging the `mask=` parameter in `loc.body()`: Use Polars expressions for streamlined styling -The most modern approach is to pass a Polars expression to the `mask=` parameter in `loc.body()`, as shown below: +* `cs.numeric()` targets numerical columns, and `.gt(650)` checks if the cell value is greater than 650. +* `pl.nth(-2, -1)` targets the last two columns, and `.is_null()` identifies missing values. + +Did you notice that we can use Polars selectors and expressions to dynamically identify columns at runtime? This is definitely a killer feature when working with pivoted operations. + +The `mask=` parameter acts as a syntactic sugar, streamlining the process and removing the need to loop through columns manually. + +::: {.callout-warning collapse="false"} +## Using `mask=` Independently +`mask=` should not be used in combination with the `columns` or `rows` arguments. Attempting to do so will raise a `ValueError`. +::: + +### Utilizing the `locations=` parameter in `GT.tab_style()` +A more "old-fashioned" approach involves passing a list of `loc.body()` objects to the `locations=` parameter in `GT.tab_style()`: ```{python} +# | eval: false ( gt.tab_style( style=style.text(color="red"), - locations=loc.body(mask=cs.numeric().gt(650)) + locations=[loc.body(columns=col, rows=pl.col(col).gt(650)) + for col in year_cols], + ).tab_style( + style=style.fill(color="black"), + locations=[loc.body(columns=col, rows=pl.col(col).is_null()) + for col in year_cols[-2:]], ) ) ``` -In this example, `loc.body()` is smart enough to automatically target the rows where the cell value exceeds 650 for each numerical column. In general, you can think of `mask=` as a syntactic sugar that Great Tables provides to save you from having to manually loop through the columns. +This approach, though functional, demands additional effort: + +* Explicitly preparing the column names in advance. +* Specifying the `columns=` and `rows=` arguments for each `loc.body()` in the loop. + +While effective, it is less efficient and more verbose compared to the first approach. ### Wrapping up We extend our gratitude to [@igorcalabria](https://github.com/igorcalabria) for suggesting this feature in [#389](https://github.com/posit-dev/great-tables/issues/389) and providing an insightful explanation of its utility. A special thanks to [@henryharbeck](https://github.com/henryharbeck) for providing the second approach. From a7807f96dc5e715579fa2c446a85ac22a2670c3d Mon Sep 17 00:00:00 2001 From: jrycw Date: Sat, 25 Jan 2025 09:51:50 +0800 Subject: [PATCH 3/6] Update author, column names, and background fill color --- docs/blog/locbody-mask/index.qmd | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/blog/locbody-mask/index.qmd b/docs/blog/locbody-mask/index.qmd index d141c5c48..7940439f3 100644 --- a/docs/blog/locbody-mask/index.qmd +++ b/docs/blog/locbody-mask/index.qmd @@ -1,7 +1,7 @@ --- title: "Style Table Body with `mask=` in `loc.body()`" html-table-processing: none -author: Rich Iannone, Michael Chow and Jerry Wu +author: Jerry Wu date: 2025-01-24 freeze: true jupyter: python3 @@ -26,7 +26,7 @@ from great_tables import GT, loc, style from great_tables.data import gtcars from polars import selectors as cs -year_cols = ["2014.0", "2015.0", "2016.0", "2017.0"] +year_cols = ["2014", "2015", "2016", "2017"] df_mini = ( pl.from_pandas(gtcars) .filter(pl.col("mfr").is_in(["Ferrari", "Lamborghini", "BMW"])) @@ -47,7 +47,7 @@ The `mask=` parameter in `loc.body()` accepts a Polars expression that evaluates Here’s how we can use it to achieve the two goals: * Highlight the cell text in red if the column datatype is numerical and the cell value exceeds 650. -* Fill the background color as black if the cell value is missing in the last two columns (`2016.0` and `2017.0`). +* Fill the background color as lightgrey if the cell value is missing in the last two columns (`2016` and `2017`). ```{python} ( @@ -55,7 +55,7 @@ Here’s how we can use it to achieve the two goals: style=style.text(color="red"), locations=loc.body(mask=cs.numeric().gt(650)) ).tab_style( - style=style.fill(color="black"), + style=style.fill(color="lightgrey"), locations=loc.body(mask=pl.nth(-2, -1).is_null()), ) ) @@ -85,7 +85,7 @@ A more "old-fashioned" approach involves passing a list of `loc.body()` objects locations=[loc.body(columns=col, rows=pl.col(col).gt(650)) for col in year_cols], ).tab_style( - style=style.fill(color="black"), + style=style.fill(color="lightgrey"), locations=[loc.body(columns=col, rows=pl.col(col).is_null()) for col in year_cols[-2:]], ) From c0341364c003f69406b0f7c674eb79f5be6a01bc Mon Sep 17 00:00:00 2001 From: jrycw Date: Sat, 25 Jan 2025 11:57:47 +0800 Subject: [PATCH 4/6] Update Wrapping up --- docs/blog/locbody-mask/index.qmd | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/blog/locbody-mask/index.qmd b/docs/blog/locbody-mask/index.qmd index 7940439f3..0b18ac93b 100644 --- a/docs/blog/locbody-mask/index.qmd +++ b/docs/blog/locbody-mask/index.qmd @@ -100,6 +100,8 @@ This approach, though functional, demands additional effort: While effective, it is less efficient and more verbose compared to the first approach. ### Wrapping up +With the introduction of the `mask=` parameter in `loc.body()`, users can now style the table body in a more vectorized-like manner, akin to using `df.apply()` in Pandas, enhancing the overall user experience. + We extend our gratitude to [@igorcalabria](https://github.com/igorcalabria) for suggesting this feature in [#389](https://github.com/posit-dev/great-tables/issues/389) and providing an insightful explanation of its utility. A special thanks to [@henryharbeck](https://github.com/henryharbeck) for providing the second approach. We hope you enjoy this new functionality as much as we do! Have ideas to make Great Tables even better? Share them with us via [GitHub Issues](https://github.com/posit-dev/great-tables/issues). We're always amazed by the creativity of our users! See you, until the next great table. From 40d525a2e1c19abeb90590f3cde8446cd572cd22 Mon Sep 17 00:00:00 2001 From: jrycw Date: Sun, 26 Jan 2025 10:11:29 +0800 Subject: [PATCH 5/6] Highlight parameters in the post --- docs/blog/locbody-mask/index.qmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/blog/locbody-mask/index.qmd b/docs/blog/locbody-mask/index.qmd index 0b18ac93b..a59f9fa8b 100644 --- a/docs/blog/locbody-mask/index.qmd +++ b/docs/blog/locbody-mask/index.qmd @@ -18,7 +18,7 @@ In Great Tables `0.16.0`, we introduced the `mask=` parameter in `loc.body()`, e Let’s dive in. ### Preparations -We'll use the built-in dataset `gtcars` to create a Polars DataFrame. Next, we'll select the columns `mfr`, `drivetrain`, `year`, and `hp` to create a small pivoted table named `df_mini`. Finally, we'll pass `df_mini` to the `GT` object to create a table named `gt`, using `drivetrain` as the `rowname_col` and `mfr` as the `groupname_col`, as shown below: +We'll use the built-in dataset `gtcars` to create a Polars DataFrame. Next, we'll select the columns `mfr`, `drivetrain`, `year`, and `hp` to create a small pivoted table named `df_mini`. Finally, we'll pass `df_mini` to the `GT` object to create a table named `gt`, using `drivetrain` as the `rowname_col=` and `mfr` as the `groupname_col=`, as shown below: ```{python} # | code-fold: true import polars as pl From f65ecffc101ee01774930264a21cf110cc830e93 Mon Sep 17 00:00:00 2001 From: jrycw Date: Mon, 27 Jan 2025 20:29:21 +0800 Subject: [PATCH 6/6] Use "styling" to replace "formatting" to avoid ambiguity --- docs/blog/locbody-mask/index.qmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/blog/locbody-mask/index.qmd b/docs/blog/locbody-mask/index.qmd index a59f9fa8b..88f7c4120 100644 --- a/docs/blog/locbody-mask/index.qmd +++ b/docs/blog/locbody-mask/index.qmd @@ -10,7 +10,7 @@ format: code-summary: "Show the Code" --- -In Great Tables `0.16.0`, we introduced the `mask=` parameter in `loc.body()`, enabling users to apply conditional formatting to rows on a per-column basis more efficiently when working with a Polars DataFrame. This post will demonstrate how it works and compare it with the "old-fashioned" approach: +In Great Tables `0.16.0`, we introduced the `mask=` parameter in `loc.body()`, enabling users to apply conditional styling to rows on a per-column basis more efficiently when working with a Polars DataFrame. This post will demonstrate how it works and compare it with the "old-fashioned" approach: * **Leveraging the `mask=` parameter in `loc.body()`:** Use Polars expressions for streamlined styling. * **Utilizing the `locations=` parameter in `GT.tab_style()`:** Pass a list of `loc.body()` objects.