This repo provides a step-by-step guide for deploying an API that can render parameterized Quarto reports. It uses the following components:
- R, and particularly the {plumber} package
- Quarto
- Docker
- Google Cloud products, including BigQuery, Cloud Run, Cloud Build, Artifact Registry, and Source Repositories
I write reports fairly often in my job, and I imagine other R developers do as well. This tends to include making different versions of the same report for different groups -- in my case, making reports for each school in the division (or for each teacher, or class, etc). And in addition to rendering these reports, getting them to appropriate stakeholders also matters. If there are a small handful of reports, then just sending emails can work, but this sort of manual process quickly becomes intractable as the number of reports grows.
A more efficient way to distribute reports is by letting stakeholders hit an API and having the reports be rendered on-demand. But this also introduces complexity in that you have to deploy an API. So, the purpose of this guide is to walk you through a process for setting up an API that can render parameterized reports. The end-product is mean to be pretty minimal, but hopefully the extensions are obvious for those of you who want to pick up and run with this.
I'm going to assume the following of the person reading this:
- You're an intermediate R user;
- You have a Google Cloud account (including a billing account!) set up already;
- You have Quarto, R, Git, and the gcloud CLI installed on your computer; and
- You have some familiarity with Git
We'll use Docker as well, but through Cloud Build, so you don't necessarily need it on your computer. That said, it doesn't hurt to have Docker Desktop installed, and some familiarity with Docker will help you here.
I'm not going to walk through how to install the software mentioned above or how to create a Google Cloud account. If you need help with any of that, I'd suggest checking out the official documentation for each of those components.
Ultimately, we're going to deploy a minimal API. Users will be able to navigate to the /report
endpoint of that API to get a report rendered for them. Users will be able to specify how many rows of data are shown in the report via a query parameter, e.g. /report?n_row=12
.
More specifically, the project will entail:
- Specifying the API using R and plumber;
- Loading sample data (from the {palmerpenguins} R package) into BigQuery;
- Creating a report template using Quarto;
- Pushing source code to a Google Cloud Source repository using git;
- Building a container image (that will ultimately run the app) using Cloud Build and Artifact Registry; and
- Deploying the resulting image as a container to Cloud Run
All of the files needed to do this will be included in the repository. Rather than including .sh
or .ps1
files to specify command-line arguments, though, I'm just going to include these commands here. I'm also going to use placeholders for project ids, repo ids, and project numbers. I'll denote these placeholders in ALL_CAPS.
Finally, it's possible to interact with Google Cloud through the console the and graphical user interfact. You can choose to do this if you'd like, and it might be easier if you're newer to Google Cloud. But I'm going to interact with Google Cloud through the gcloud CLI, which makes the process more replicable.
On your local machine, you'll want to create a new folder to house your project
mkdir MY-PROJECT
cd MY-PROJECT
If you haven't used the gcloud CLI before, you'll need to initialize it via:
gcloud init
Then, you want to create a new Google Cloud project using gcloud:
gcloud projects create MY-PROJECT
In a real use case, you'll probably want your API to access data from a database. So we're going to simulate that here using Google's BigQuery -- a datawarehouse designed for data analytics.
The code in the seed_bigquery.R
file will load the {palmerpenguins} data into a BigQuery table that we can access from our API. If it's your first time using the {bigrquery}
package, you might be prompted to do some housekeeping/granting access to resources.
#install.packages(c("bigrquery", "palmerpenguins"))
library(bigrquery)
library(palmerpenguins)
x <- bq_dataset("MY-PROJECT", "penguins_data")
bq_dataset_create(x)
y <- bq_table(x, "penguins")
bq_table_create(y, fields = penguins)
bq_table_upload(y, penguins)
This will create a new dataset in your project, then create a new table in that dataset, then upload the palmer penguins data into the table.
You can navigate to BigQuery in your Google Cloud console to confirm that the data uploaded.
Next, we want to create a report template. This will define what gets rendered when users interact with our API.
We'll use Quarto for this, so the file will be a .qmd
file. And we'll specify one parameter, n_row
, that users can set to determine how many rows of data are included in the report, but this would obviously be something more meaningful in a real application.
You can see the template in the report_template.qmd
file, but it's included below as well:
---
title: "My Report"
params:
n_row: 10
format:
html:
embed-resources: true
---
This report will print `r params$n_row` rows from the palmer penguins dataset
```{{r}}
library(DBI)
library(bigrquery)
library(glue)
n <- min(c(344, as.numeric(params$n_row)))
proj <- "MY-PROJECT"
ds <- "penguins_data"
tbl <- "penguins"
con <- dbConnect(
bigrquery::bigquery(),
project = proj,
dataset = ds
)
q <- glue_sql("
SELECT *
FROM `MY-PROJECT.penguins_data.penguins`
LIMIT {n}
", .con = con)
res <- dbGetQuery(con, q)
res
```
Now that we have a report template, we want to define an API endpoint that users can hit to generate on-demand reports. To do that, we'll use the {plumber}
R package.
Plumber is kinda like {roxygen2}
in that it uses specially-formatted comments to define behaviors. Roxygen does this for package documentation, and plumber does this for API behaviors.
You can define an endpoint for your API using a series of these special comments, denoted with #*, and then a function definition
For example, we can define an endpoint that echoes back the message the user passed in like so (note that this example comes from the {plumber}
docs):
#* Echo back the input
#* @param msg The message to echo
#* @get /echo
function(msg="") {
list(msg = paste0("The message is: '", msg, "'"))
}
Where the @param
tag defines a query parameter that can be passed to the function, and the @get
tag specifices that the subsequent function will be called on GET requests to the /echo
endpoint
We can define an endpoint that will render our Quarto report template like so:
#* Render an Rmd report
#* @serializer html
#* @param n number of rows to display
#* @get /report
function(n_row = 10) {
tmp <- paste0(sample(c(letters, 0:9), 16, replace = TRUE), collapse = "")
tmp <- paste0(tmp, ".html")
quarto::quarto_render("report_template.qmd",
output_file = tmp,
execute_params = list(n_row = n_row)
)
readBin(tmp, "raw", n = file.info(tmp)$size)
}
This will render the Quarto report as a temporary file (sort of) and then read that temporary file. I say it's sort of a temporary file because our Docker container will trash any files it creates once it shuts down, but the file isn't truly temporary (if you run this on your machine, the file will persist until you delete it).
The code above is in the plumber.R
file in this repo.
That code just defines the endpoint, but we also need something that will set up and run the API. This is what the code in run.R
does:
library(plumber)
pr("plumber.R") |>
pr_run(port = 8080, host = "0.0.0.0")
# note -- if you want to test this locally (not in a docker container),
# don't include the host argument
Plumber is doing all of the heavy lifting here -- we're just telling it which file to reference in pr()
(we want it to create endpoints defined in the plumber.R
file) and providing the port and host we want it to run on. EZ game.
At this point, we've done everything we need to do in R and Quarto. We've loaded data into BigQuery (n.b. that we could have done this in lots of different ways), created a report template, and created an API that users can access to render reports.
Now we need to shift over to thinking about how to deploy it.
Cloud Source Repositories is Google's service for hosting private git repos. I'm going to use that here because that's what I use at work, but you can use Github if you'd prefer (this will change some of the options in the Cloud Build step below).
We can create a new Google repo from the command line with gcloud as follows:
gcloud source repos create MY-REPO --project=MY-PROJECT
Then you do the usual git workflow for pushing code
git add .
git commit -m "my commit message"
git remote add origin https://source.developers.google.com/p/MY-PROJECT/r/MY-REPO
git push --all origin
Again, depending on how you've configured gcloud and git, you might have to do some extra authentication stuff here to ensure git has the appropriate credentials to push to Google Cloud. If you use Windows and run into issues, you might try cmd
rather than powershell
Dockerfiles are how you specify exactly what should be included in Docker images. If you're not super familiar with them, there are oodles of resources and tutorials on them, so I'm not going to try to reproduce those. Other people would do much better than I would.
At a very high level, though, Dockerfiles allow us to start with a base image (like an operating system such as Debian, or an OS with some other packages/libraries already installed), then systematically modify that base image until it meets our needs. You do this by defining operations that will run via the command line of the base image.
If you're an intermediate Linux user like me, a lot of your Dockerfile will be copy/pasted and slightly modified from other Dockerfiles you find online. Maybe this leaves you building containers that are bigger than they need to be, but there are worse sins.
Nevertheless. If you're an R user, the Rocker project defines a bunch of containers useful for R folks. We'll start with their verse image for this project, which provides R and some OS libraries for rendering reports.
From there, we'll download and install Quarto, copy necessary files over, install R packages, expose a port, and run our run.R
file:
ARG R_VER="latest"
FROM rocker/verse:${R_VER}
RUN apt-get update && apt-get install -y --no-install-recommends \
wget
RUN wget https://github.com/quarto-dev/quarto-cli/releases/download/v1.3.450/quarto-1.3.450-linux-amd64.deb -O ~/quarto.deb
# Install the latest version of Quarto
RUN apt-get install --yes ~/quarto.deb
# Remove the installer
RUN rm ~/quarto.deb
COPY . .
# install r pkgs
RUN install2.r --error --skipinstalled --ncpus -1 \
glue \
DBI \
bigrquery \
plumber \
quarto
ENV PORT 8080
EXPOSE ${PORT}
ENTRYPOINT ["Rscript", "run.R"]
The Dockerfile we just created contains instructions for building an image that will run our application. Now, we need a service that can execute these instructions and deploy the image we build. We're going to use Google's Cloud Build to do this. Cloud Build is a continuous integration/continuous deployment (CI/CD) service that can help build and deploy software.
There are multiple different ways to interact with Cloud Build, but my preferred way is through a cloudbuild.yaml
file. This file contains instructions that Cloud Build will execute for us. In this case, we're going to specify 3 steps:
- Build a Docker image (using the Dockerfile we just wrote);
- Push the image to Google Artifact Registry (a service that stores images, similar to Dockerhub); and
- Deploy our image to Cloud Run (a serverless deployment service that runs containers)
Here's what the file looks like:
steps:
#docker build
- name: 'gcr.io/cloud-builders/docker'
args: ['build', '-t',
'us-east4-docker.pkg.dev/${PROJECT_ID}/demo-repo/demo-image:$COMMIT_SHA',
'.']
#docker push to artifact registry
- name: 'gcr.io/cloud-builders/docker'
args: ['push', 'us-east4-docker.pkg.dev/${PROJECT_ID}/demo-repo/demo-image:$COMMIT_SHA']
#deploy the container to cloud run
- name: 'gcr.io/cloud-builders/gcloud'
args: ['run', 'deploy', 'img', '--image', 'us-east4-docker.pkg.dev/${PROJECT_ID}/demo-repo/demo-image:$COMMIT_SHA', '--region', 'us-east4', '--max-instances=1', '--min-instances=0', '--allow-unauthenticated']
You can see these instructions specified in the cloudbuild.yaml
file. These are command-line arguments that the named services will run for us. For instance, the first step calls docker build
along with some other arguments that control the build. The 3rd step calls gcloud run deploy
and also specifies that:
- We want to deploy a service called "img";
- We want to deploy an image from the "us-east4-docker..." URL (which is where we pushed our image earlier);
- We want to deploy in the us-east4 region;
- We want a maximum of 1 instance and a minimum of 0 (you may want a different configuration for a real application); and
- We want to allow unauthenticated traffic
You may want to lookup additional arguments you can pass to docker or gcloud in their official documentation.
We mostly have the bones of our project set up now, but we need to do some housekeeping via gcloud. This includes enabling some services, granting appropriate permissions to our service accounts (think: automated accounts that you delegate to run services for you), and creating a repository where the image we're building will be stored.
By default, when you create a Google Cloud project, most services are disabled, and you need to enable them before you can use them. So we'll enable Cloud Build and Cloud Run here (note -- you're not billed for enabling services; only for using them)
gcloud services enable cloudbuild.googleapis.com
gcloud services enable run.googleapis.com
The artifact repository specified in the cloudbuild.yaml
file above needs to exist for us to be able to push an image to it. So let's do that:
gcloud artifacts repositories create demo-repo --repository-format docker --location us-east4 --project=MY-PROJECT
This will also prompt us to enable the Artifact Registry API if we haven't already done so.
Google Cloud will generate a default service account for your project, and that's the one we'll use here. We need to grant that service account a few permissions that will allow it to complete the build, push, and deploy steps above:
gcloud projects add-iam-policy-binding MY-PROJECT --member="serviceAccount:[email protected]" --role=roles/run.admin
gcloud projects add-iam-policy-binding MY-PROJECT --member="serviceAccount:[email protected]" --role=roles/iam.serviceAccountUser
You'll notice that the service account is designated with your project number. If you don't know this (and why would you?), you can find it via:
gcloud projects list
Last step! So far, we've specified an API that we want to deploy (using R, plumber, and Quarto), we've described an image (via Docker) that we'll use to run this API, and we've configured various Google Cloud services to build and deploy this service.
The last thing we need to do is create a build trigger. The build trigger will tell Cloud Build when to execute the steps described in cloudbuild.yaml
. There are a few approaches to doing this, but we're going to set our trigger to initiate a build whenever we push a commit to the master branch of our remote git repository.
We can do this via gcloud like so:
gcloud builds triggers create cloud-source-repositories --repo="MY-REPO" --project=MY-PROJECT --branch-pattern="^master$" --name="push-master" --build-config="cloudbuild.yaml"
Hopefully the arguments in that command are fairly self-explanatory, but it does exactly what it we said earlier -- it'll execute the steps in cloudbuild.yaml
whenever we push code to the master branch of "MY-REPO" (or whatever you named your repository). Note that one of the arguments specifies cloud-source-repositories
-- if you're using a Github repo instead, you'd want to use github
here instead.
That's it. Now, we just need to push some code to our source repository, the everything should work.
If we do the usual git dance (add, commit, push) and then navigate to the Google Cloud Build dashboard, you can follow along with the logs as your build executes. Once it's done, you should get a message showing you the URL where the API is deployed. Navigate to that /report
endpoint of that URL and you should get a report to render, e.g.:
my-service-url.run.app/report
or
my-service-url.run.app/report?n_row=12
Rendering your report might take a few seconds, but ultimately you should get a very minimal Quarto report that pulls in the penguins data from BigQuery.
To make sure you don't continue to incur charges, you probably want to tear down the project. You can do this via:
gcloud projects delete MY-PROJECT
Running all of the above should work for you, but if you run into any issues, gcloud
, the Cloud Build logs, and the Cloud Run logs will all give you error messages that should (hopefully) help you diagnose any problems.
Obviously, this application isn't really useful on its own -- it's meant to show you the skeleton for setting up an application that is useful. As you modify and extend this, you'll potentially need to install additional R packages or additional system libraries, both of which will require modifying the Dockerfile. If you're like me, this will entail some trial and error, so it's probably a better approach to test these modifications and builds locally (via Docker Desktop) than in the cloud.
I hope this guide helps folks in the R community! Feel free to open an issue in this repo or create a pull request to improve this guide.