{catalog} gives the user access to the Spark Catalog API making use of the {sparklyr} API. Catalog is the interface for managing a metastore (aka metadata catalog) of relational entities (e.g. database(s), tables, functions, table columns and temporary views).
You can install:
- the development version from GitHub with
# install.packages("remotes")
remotes::install_github("nathaneastwood/catalog")
- the latest release from CRAN with
install.packages("catalog")
{catalog} provides an API matching that of the Catalog API and provides full access to all methods. Below is a small example of some of the functionality.
sc <- sparklyr::spark_connect(master = "local")
mtcars_spark <- sparklyr::copy_to(dest = sc, df = mtcars)
library(catalog)
list_tables(sc)
# # A tibble: 1 × 5
# name database description tableType isTemporary
# <chr> <chr> <chr> <chr> <lgl>
# 1 mtcars <NA> <NA> TEMPORARY TRUE
list_columns(sc, "mtcars")
# # A tibble: 11 × 6
# name description dataType nullable isPartition isBucket
# <chr> <chr> <chr> <lgl> <lgl> <lgl>
# 1 mpg <NA> double TRUE FALSE FALSE
# 2 cyl <NA> double TRUE FALSE FALSE
# 3 disp <NA> double TRUE FALSE FALSE
# 4 hp <NA> double TRUE FALSE FALSE
# 5 drat <NA> double TRUE FALSE FALSE
# 6 wt <NA> double TRUE FALSE FALSE
# 7 qsec <NA> double TRUE FALSE FALSE
# 8 vs <NA> double TRUE FALSE FALSE
# 9 am <NA> double TRUE FALSE FALSE
# 10 gear <NA> double TRUE FALSE FALSE
# 11 carb <NA> double TRUE FALSE FALSE
list_functions(sc)
# # A tibble: 349 × 5
# name database description className isTem…¹
# <chr> <chr> <chr> <chr> <lgl>
# 1 ! <NA> <NA> org.apache.spark.sql.catalyst.expressions… TRUE
# 2 % <NA> <NA> org.apache.spark.sql.catalyst.expressions… TRUE
# 3 & <NA> <NA> org.apache.spark.sql.catalyst.expressions… TRUE
# 4 * <NA> <NA> org.apache.spark.sql.catalyst.expressions… TRUE
# 5 + <NA> <NA> org.apache.spark.sql.catalyst.expressions… TRUE
# 6 - <NA> <NA> org.apache.spark.sql.catalyst.expressions… TRUE
# 7 / <NA> <NA> org.apache.spark.sql.catalyst.expressions… TRUE
# 8 < <NA> <NA> org.apache.spark.sql.catalyst.expressions… TRUE
# 9 <= <NA> <NA> org.apache.spark.sql.catalyst.expressions… TRUE
# 10 <=> <NA> <NA> org.apache.spark.sql.catalyst.expressions… TRUE
# # … with 339 more rows, and abbreviated variable name ¹isTemporary
# # ℹ Use `print(n = ...)` to see more rows
drop_temp_view(sc, "mtcars")
# [1] TRUE
For more information, please refer to the package website.