Replication Package for "Function-as-a-Service Performance Evaluation: A Multivocal Literature Review"
This replication package contains the raw dataset, scripts to produce all plots, and documentation on how to replicate our MLR study on FaaS performance evaluation.
J. Scheuner and P. Leitner, “Function-as-a-Service Performance Evaluation: A Multivocal Literature Review,” Journal of Systems and Software.
Function-as-a-Service (FaaS) is one form of the serverless cloud computing paradigm and is defined through FaaS platforms (e.g., AWS Lambda) executing event-triggered code snippets (i.e., functions). Many studies that empirically evaluate the performance of such FaaS platforms have started to appear but we are currently lacking a comprehensive understanding of the overall domain. To address this gap, we conducted a multivocal literature review (MLR) covering 112 studies from academic (51) and grey (61) literature. We find that existing work mainly studies the AWS Lambda platform and focuses on micro-benchmarks using simple functions to measure CPU speed and FaaS platform overhead (i.e., container cold starts). Further, we discover a mismatch between academic and industrial sources on tested platform configurations, find that function triggers remain insufficiently studied, and identify HTTP API gateways and cloud storages as the most used external service integrations. Following existing guidelines on experimentation in cloud systems, we discover many flaws threatening the reproducibility of experiments presented in the surveyed studies. We conclude with a discussion of gaps in literature and highlight methodological suggestions that may serve to improve future FaaS performance evaluation studies.
@article{scheuner:20-jss,
author = {Scheuner, Joel and Leitner, Philipp},
journal = {Journal of Systems and Software},
doi = {10.1016/j.jss.2020.110708},
title = {Function-as-a-Service Performance Evaluation: A Multivocal Literature Review},
year = {2020}
}
All extracted data originating from academic and grey literature studies is available as machine-readable CSV (./data/faas_mlr_raw.csv) and human-readable XLSX (./data/faas_mlr_raw.xlsx). The Excel file also contains all 700+ comments with guidance, decision rationales, and extra information. It is configured with a filtered view to display only relevant sources but contains the complete data (i.e., including discussion for sources considered to be not relevant in our context).
The latest version is also available online as an interactive Google spreadsheet (GSheet): https://docs.google.com/spreadsheets/d/1EK9yg9fMZIDybnbi7thsnBx1NdqDmkW86sMygH9r8q8
The following steps describe how to use interactive querying:
-
Chose
Data > Filter views > Save as temporary filter view
-
Explore the dataset using GSheet sort & filter functionality (e.g., discover open source studies):
The query_academic directory contains all search results in the *.bib
format.
The following figure summarizes all sources:
Manual Search consists of screening the following related publications:
- a) J. Kuhlenkamp and S. Werner, “Benchmarking FaaS platforms: Call for community participation,” in 2018 IEEE/ACM International Conference on Utility and Cloud Computing Companion (UCC Companion), pp. 189–194, 2018.
- b) J. Spillner and M. Al-Ameen, “Serverless literature dataset,” 2019.
- c) V. Yussupov, U. Breitenbücher, F. Leymann, and M. Wurster, “A systematic mapping study on engineering function-as-a-service platforms and tools,” in Proceedings of the 12th IEEE/ACM International Conference on Utility and Cloud Computing, pp. 229–240, 2019.
For the Database Search strategy, we use the following search string for all sources:
(serverless OR faas) AND (performance OR benchmark) AND experiment AND lambda
We only consider publications after 2015-01-01 by either configuring the search engine appropriately or adding the following suffix to the search string: AND year>=2015
.
(serverless OR faas)
find studies in the area of serverless computing and Function-as-a-Service(performance OR benchmark)
find studies related to performance and performance benchmarkingexperiment
targets empirical research. We are interested in measurement-based approaches but aim to exclude pure modeling research, FaaS surveys, FaaS feature comparisons, etc. We assume that academic papers mention their research methodology.lambda
narrows the search string to actual FaaS platforms (i.e., AWS Lambda) or when referring to 'lambda functions' (independently of the provider as used by~\citet{oakes:18}) to avoid a large number of false positives from other domains as experienced by Yussupov et al..
We performed the following adaptations of the search string:
- Without
lambda
keyword: Omitting the keywordlambda
resulted in too many false positives with a total of 4805 matches (vs 691). We used an initial training set of 43 publications and found that 100% of them contain the string "lambda" in their fulltext. - With double quotes
"
: Using double quotes includes only exact string matches and resulted in a total of 376 publications (vs 691) or 357 after duplicate removal. We found that this query is too narrow as it misses 6 relevant publications that are covered with our chosen search string.
We use the advanced query syntax of the following academic research databases:
ID | Research Database | Advanced Query Engine | Docs |
---|---|---|---|
acm | ACM Digital Library | https://dlnext.acm.org/search/advanced | Sidebar |
ieee | IEEE Explore | https://ieeexplore.ieee.org/search/advanced/command | Link |
wos | ISI Web of Science* | http://apps.webofknowledge.com/UA_GeneralSearch_input.do?product=UA&search_mode=GeneralSearch | Link |
sd | Science Direct | https://www.sciencedirect.com/search/advanced | Link |
springer | SpringerLink | https://link.springer.com/search | Link |
wiley | Wiley InterScience | https://onlinelibrary.wiley.com/search/advanced | |
scopus | Scopus* | https://www.scopus.com/search/form.uri?display=advanced | Link |
* Requires institutional (e.g., through university VPN) or personal account
The following table summarizes the initial search results and provides the exact query string and direct link for all databases.
The search was performed at 2019-10-21 and all results are available as ID.bib
under ./data/query_academic.
* Requires manual steps: 1) copy the query string into the advanced search field 2) add custom year range 2015 - 2019
The following instructions show how query results from the research databases are exported into *.bib
files:
- acm: 1) choose 100 per page 2) select all 3) export citation 4) choose bibtex
- ieee: 1) choose 100 per page 2) select all 3) export > citations > bibtex 4) copy/paste into ieee.bib file
- wos: 1) select page 2) Export > Other file formats 3) Choose Bibtex and author,source,title
- sd: 1) display 100 per page 2) Export > Export citations to bibtex
- springer: 1) Download results as CSV 2) Open CSV in Excel and copy all DOIs 3) Paste DOIs into Zotero's "Add item by identifier" 4) Right-click selection and export all to bibtex
- SpringerLink does not support bibtex export and therefore, we followed a workaround described here. Notice that the import/matching could take a while until the indices and paper counts in the list are updated properly.
- wiley: 1) Select all 2) Export citations > bibtex 3) repeat for all pages 4) merge all result files
- scopus: 1) choose 100 per page 2) select all 3) Export > Bibtex
The query_grey directory contains all search results in the formats *.pdf
and *.html
.
The following figure summarizes all sources:
For the 1) Web Search strategy, we use the following search string for all sources:
(serverless OR faas) AND (performance OR benchmark)
We perform an additional Google search with the exact same search string as used for academic literature but adjust the search string for more informal grey literature by omitting the keywords experiment
and lambda
.
We use the following web search engines:
ID | Search Engine | URL |
---|---|---|
Google Web Search | https://www.google.com/ | |
Twitter Search | https://twitter.com/ | |
hackernews | Hacker News Algolia Search | https://hn.algolia.com/ |
Reddit Search | https://www.reddit.com/search | |
medium | Medium Search | https://medium.com/search |
The following table summarizes the number of relevant studies and provides the exact query string and direct link for all web searches.
The search was performed at 2019-10-21 and all results are available as ID.bib
under ./data/query_academic.
Notice that the number of relevant studies are already de-duplicated, meaning that we found 18 relevant studies through google1 search and the additional +7 studies from google2 search only include new non-duplicate studies.
Notice that with the exception of Google Search, advanced queries including logical expressions (e.g., "OR") are not supported.
Therefore, we manually compose four subqueries to implement an equivalent search string.
ID | Date | # | Exact Query String and Link |
---|---|---|---|
google1 | 2019/11/26 | 18 | ("serverless" OR "faas") AND ("performance" OR "benchmark") AND "experiment" AND "lambda" after:2015-01-01 |
google2 | 2019/11/26 | +7 | ("serverless" OR "faas") AND ("performance" OR "benchmark") after:2015-01-01 |
twitter1 | 2019/12/03 | +2 | faas benchmark |
twitter2 | 2019/12/03 | +3 | serverless benchmark |
twitter3 | 2019/12/03 | +0 | faas performance |
twitter4 | 2019/12/03 | +3 | serverless performance |
hackernews1 | 2019/12/06 | +0 | faas benchmark |
hackernews2 | 2019/12/06 | +0 | serverless benchmark |
hackernews3 | 2019/12/06 | +0 | faas performance |
hackernews4 | 2019/12/06 | +1 | serverless performance |
reddit1 | 2019/12/06 | +0 | faas benchmark |
reddit2 | 2019/12/06 | +0 | serverless benchmark |
reddit3 | 2019/12/06 | +3 | faas performance |
reddit4 | 2019/12/06 | +0 | serverless performance |
medium1 | 2020/02/18 | +0 | faas benchmark |
medium2 | 2020/02/18 | +2 | serverless benchmark |
medium3 | 2020/02/18 | +0 | faas performance |
medium4 | 2020/02/18 | +1 | serverless performance |
We used non-personalized private search mode through private Google Chrome browser windows wherever possible.
Notice that the number of search results for Google search is only a rough estimate and typically changes (dramatically) when reaching the last page1.
Therefore, we used double quotes "
for exact matching (i.e., exclude Google's fuzzy search results) and achieving more accurate search estimates.
Further, Google filters out highly redundant search results by default. For the google1 query, we repeated the search with disabled redundancy filtering and kept both versions (e.g., google2 and google2.2 or google4.2 but google4 doesn't exist because omitted results have less pages)2.
We used the Google Chrome export options for PDF and HTML in combination with the following steps:
- google: 1) Paste link in private browser mode 2) Settings > Search Settings: choose region "United States" and 100 results per page
- twitter: 1) Paste link in browser 2) scroll down to the bottom of the page until all search results appear
- hackernews: 1) Paste link in browser 2) scroll down to the bottom of the page until all search results appear
- reddit: 1) Paste link in browser 2) scroll down to the bottom of the page until all search results appear
- medium: 1) Paste link in browser 2) scroll down to the bottom of the page until all search results appear
The authors also saved PDF or HTML files of all relevant articles in case some sources become unavailable. However, we cannot publish these website copies for legal reasons.
1 Google Support: The count of the number of search results is incorrect
Search Engine Land: Why Google Can’t Count Results Properly
2 Google Support: In order to show you the most relevant results, we have omitted some entries
An up-to-date R language toolchain preferably with RStudio is required.
Install the required packages imported at the top of each *.R
file (e.g., install.packages("ggplot2")
) from the official CRAN package repository (RStudio automatically detects to-be-installed packages).
A dependency installation script is provided under plots/install_dependencies.R
-
Open the RStudio project faas_mlr.Rproj (or alternatively set the R working directory to
./plots
) -
Run a given
*.R
file to produce the corresponding*.pdf
plot. Example:characteristics.R
producescharacteristics.pdf
. Example:cd plots Rscript characteristics.R
The plots follow the economist color scheme in the ggthemes package.
Software | Version |
---|---|
R | 4.2.0 |
tidyr | 1.1.0.9000 (dev) |
vctrs | 0.3.2.9000 (dev) |
dplyr | 1.0.1 (dev) |
forcats | 0.5.0 |
ggplot2 | 3.3.2 |
ggthemes | 4.2.0 |
NOTE: In issue in
vctrs
caused error messages such as... is not empty
(2020-06) but was fixed in July and works with the above installed versions (dev versions as of 2020-07-19).