Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blog: add blog post promoting the mask= parameter in loc.body() #584

Merged
merged 7 commits into from
Jan 27, 2025
Merged
105 changes: 105 additions & 0 deletions docs/blog/locbody-mask/index.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
---
title: "Style Table Body with `mask=` in `loc.body()`"
html-table-processing: none
author: Rich Iannone, Michael Chow and Jerry Wu
date: 2025-01-24
freeze: true
jupyter: python3
format:
html:
code-summary: "Show the Code"
---

In Great Tables `0.16.0`, we introduced the `mask=` parameter in `loc.body()`, enabling users to apply conditional formatting to rows on a per-column basis more efficiently when working with a Polars DataFrame. This post will demonstrate how it works and compare it with the "old-fashioned" approach:

* **Leveraging the `mask=` parameter in `loc.body()`:** Use Polars expressions for streamlined styling.
* **Utilizing the `locations=` parameter in `GT.tab_style()`:** Pass a list of `loc.body()` objects.

Let’s dive in.

### Preparations
We'll use the built-in dataset `gtcars` to create a Polars DataFrame. Next, we'll select the columns `mfr`, `drivetrain`, `year`, and `hp` to create a small pivoted table named `df_mini`. Finally, we'll pass `df_mini` to the `GT` object to create a table named `gt`, using `drivetrain` as the `rowname_col` and `mfr` as the `groupname_col`, as shown below:
```{python}
# | code-fold: true
import polars as pl
from great_tables import GT, loc, style
from great_tables.data import gtcars
from polars import selectors as cs

year_cols = ["2014.0", "2015.0", "2016.0", "2017.0"]
df_mini = (
pl.from_pandas(gtcars)
.filter(pl.col("mfr").is_in(["Ferrari", "Lamborghini", "BMW"]))
.sort("drivetrain")
.pivot(on="year", index=["mfr", "drivetrain"], values="hp", aggregate_function="mean")
.select(["mfr", "drivetrain", *year_cols])
)

gt = GT(df_mini).tab_stub(rowname_col="drivetrain", groupname_col="mfr").opt_stylize(color="cyan")
gt
```

The numbers in the cells represent the average horsepower for each combination of `mfr` and `drivetrain` for a specific year.

### Leveraging the `mask=` parameter in `loc.body()`
The `mask=` parameter in `loc.body()` accepts a Polars expression that evaluates to a boolean result for each cell.

Here’s how we can use it to achieve the two goals:

* Highlight the cell text in red if the column datatype is numerical and the cell value exceeds 650.
* Fill the background color as black if the cell value is missing in the last two columns (`2016.0` and `2017.0`).

```{python}
(
gt.tab_style(
style=style.text(color="red"),
locations=loc.body(mask=cs.numeric().gt(650))
).tab_style(
style=style.fill(color="black"),
locations=loc.body(mask=pl.nth(-2, -1).is_null()),
)
)
```

In this example:

* `cs.numeric()` targets numerical columns, and `.gt(650)` checks if the cell value is greater than 650.
* `pl.nth(-2, -1)` targets the last two columns, and `.is_null()` identifies missing values.

Did you notice that we can use Polars selectors and expressions to dynamically identify columns at runtime? This is definitely a killer feature when working with pivoted operations.

The `mask=` parameter acts as a syntactic sugar, streamlining the process and removing the need to loop through columns manually.

::: {.callout-warning collapse="false"}
## Using `mask=` Independently
`mask=` should not be used in combination with the `columns` or `rows` arguments. Attempting to do so will raise a `ValueError`.
:::

### Utilizing the `locations=` parameter in `GT.tab_style()`
A more "old-fashioned" approach involves passing a list of `loc.body()` objects to the `locations=` parameter in `GT.tab_style()`:
```{python}
# | eval: false
(
gt.tab_style(
style=style.text(color="red"),
locations=[loc.body(columns=col, rows=pl.col(col).gt(650))
for col in year_cols],
).tab_style(
style=style.fill(color="black"),
locations=[loc.body(columns=col, rows=pl.col(col).is_null())
for col in year_cols[-2:]],
)
)
```

This approach, though functional, demands additional effort:

* Explicitly preparing the column names in advance.
* Specifying the `columns=` and `rows=` arguments for each `loc.body()` in the loop.

While effective, it is less efficient and more verbose compared to the first approach.

### Wrapping up
We extend our gratitude to [@igorcalabria](https://github.com/igorcalabria) for suggesting this feature in [#389](https://github.com/posit-dev/great-tables/issues/389) and providing an insightful explanation of its utility. A special thanks to [@henryharbeck](https://github.com/henryharbeck) for providing the second approach.

We hope you enjoy this new functionality as much as we do! Have ideas to make Great Tables even better? Share them with us via [GitHub Issues](https://github.com/posit-dev/great-tables/issues). We're always amazed by the creativity of our users! See you, until the next great table.
Loading