Replication Package for "Function-as-a-Service Performance Evaluation: A Multivocal Literature Review"

This replication package contains the raw dataset, scripts to produce all plots, and documentation on how to replicate our MLR study on FaaS performance evaluation.

Paper

J. Scheuner and P. Leitner, “Function-as-a-Service Performance Evaluation: A Multivocal Literature Review,” Journal of Systems and Software.

Abstract

Function-as-a-Service (FaaS) is one form of the serverless cloud computing paradigm and is defined through FaaS platforms (e.g., AWS Lambda) executing event-triggered code snippets (i.e., functions). Many studies that empirically evaluate the performance of such FaaS platforms have started to appear but we are currently lacking a comprehensive understanding of the overall domain. To address this gap, we conducted a multivocal literature review (MLR) covering 112 studies from academic (51) and grey (61) literature. We find that existing work mainly studies the AWS Lambda platform and focuses on micro-benchmarks using simple functions to measure CPU speed and FaaS platform overhead (i.e., container cold starts). Further, we discover a mismatch between academic and industrial sources on tested platform configurations, find that function triggers remain insufficiently studied, and identify HTTP API gateways and cloud storages as the most used external service integrations. Following existing guidelines on experimentation in cloud systems, we discover many flaws threatening the reproducibility of experiments presented in the surveyed studies. We conclude with a discussion of gaps in literature and highlight methodological suggestions that may serve to improve future FaaS performance evaluation studies.

Citation

@article{scheuner:20-jss,
  author = {Scheuner, Joel and Leitner, Philipp},
  journal = {Journal of Systems and Software},
  doi = {10.1016/j.jss.2020.110708},
  title = {Function-as-a-Service Performance Evaluation: A Multivocal Literature Review},
  year = {2020}
}

Dataset

All extracted data originating from academic and grey literature studies is available as machine-readable CSV (./data/faas_mlr_raw.csv) and human-readable XLSX (./data/faas_mlr_raw.xlsx). The Excel file also contains all 700+ comments with guidance, decision rationales, and extra information. It is configured with a filtered view to display only relevant sources but contains the complete data (i.e., including discussion for sources considered to be not relevant in our context).

Interactive GSheet

The latest version is also available online as an interactive Google spreadsheet (GSheet): https://docs.google.com/spreadsheets/d/1EK9yg9fMZIDybnbi7thsnBx1NdqDmkW86sMygH9r8q8

The following steps describe how to use interactive querying:

Chose Data > Filter views > Save as temporary filter view
Explore the dataset using GSheet sort & filter functionality (e.g., discover open source studies):

Academic Literature Search Queries

The query_academic directory contains all search results in the *.bib format. The following figure summarizes all sources:

Manual Search for Academic Literature

Manual Search consists of screening the following related publications:

a) J. Kuhlenkamp and S. Werner, “Benchmarking FaaS platforms: Call for community participation,” in 2018 IEEE/ACM International Conference on Utility and Cloud Computing Companion (UCC Companion), pp. 189–194, 2018.
b) J. Spillner and M. Al-Ameen, “Serverless literature dataset,” 2019.
c) V. Yussupov, U. Breitenbücher, F. Leymann, and M. Wurster, “A systematic mapping study on engineering function-as-a-service platforms and tools,” in Proceedings of the 12th IEEE/ACM International Conference on Utility and Cloud Computing, pp. 229–240, 2019.

Database Search

For the Database Search strategy, we use the following search string for all sources:

(serverless OR faas) AND (performance OR benchmark) AND experiment AND lambda

We only consider publications after 2015-01-01 by either configuring the search engine appropriately or adding the following suffix to the search string: AND year>=2015.

Query Motivation

(serverless OR faas) find studies in the area of serverless computing and Function-as-a-Service
(performance OR benchmark) find studies related to performance and performance benchmarking
experiment targets empirical research. We are interested in measurement-based approaches but aim to exclude pure modeling research, FaaS surveys, FaaS feature comparisons, etc. We assume that academic papers mention their research methodology.
lambda narrows the search string to actual FaaS platforms (i.e., AWS Lambda) or when referring to 'lambda functions' (independently of the provider as used by~\citet{oakes:18}) to avoid a large number of false positives from other domains as experienced by Yussupov et al..

Query Adaptations

We performed the following adaptations of the search string:

Without lambda keyword: Omitting the keyword lambda resulted in too many false positives with a total of 4805 matches (vs 691). We used an initial training set of 43 publications and found that 100% of them contain the string "lambda" in their fulltext.
With double quotes ": Using double quotes includes only exact string matches and resulted in a total of 376 publications (vs 691) or 357 after duplicate removal. We found that this query is too narrow as it misses 6 relevant publications that are covered with our chosen search string.

Database Search Engines

We use the advanced query syntax of the following academic research databases:

ID	Research Database	Advanced Query Engine	Docs
acm	ACM Digital Library	https://dlnext.acm.org/search/advanced	Sidebar
ieee	IEEE Explore	https://ieeexplore.ieee.org/search/advanced/command	Link
wos	ISI Web of Science^*	http://apps.webofknowledge.com/UA_GeneralSearch_input.do?product=UA&search_mode=GeneralSearch	Link
sd	Science Direct	https://www.sciencedirect.com/search/advanced	Link
springer	SpringerLink	https://link.springer.com/search	Link
wiley	Wiley InterScience	https://onlinelibrary.wiley.com/search/advanced	PDF
scopus	Scopus^*	https://www.scopus.com/search/form.uri?display=advanced	Link

^* Requires institutional (e.g., through university VPN) or personal account

Initial Search Details

The following table summarizes the initial search results and provides the exact query string and direct link for all databases. The search was performed at 2019-10-21 and all results are available as ID.bib under ./data/query_academic.

ID	#	Exact Query String and Link
acm	126	[[All: serverless] OR [All: faas]] AND [[All: performance] OR [All: benchmark]] AND [All: experiment] AND [All: lambda] AND [Publication Date: (01/01/2015 TO *)]
ieee	215	((("Full Text & Metadata":serverless) OR ("Full Text & Metadata":faas)) AND (("Full Text & Metadata":performance) OR ("Full Text & Metadata":benchmark)) AND ("Full Text & Metadata":experiment) AND ("Full Text & Metadata":lambda))
wos^*	3	ALL=((serverless OR faas) AND (performance OR benchmark) AND experiment AND lambda)
sd	35	(serverless OR faas) AND (performance or benchmark) AND experiment AND lambda
springer	130	(serverless OR faas) AND (performance OR benchmark) AND experiment AND lambda
wiley	149	(serverless OR faas) AND (performance OR benchmark) AND experiment AND lambda
scopus	33	(ALL(serverless) OR ALL(faas)) AND (ALL(performance) OR ALL(benchmark)) AND ALL(experiment) AND ALL(lambda) AND PUBYEAR > 2015

^* Requires manual steps: 1) copy the query string into the advanced search field 2) add custom year range 2015 - 2019

Export Instructions

The following instructions show how query results from the research databases are exported into *.bib files:

acm: 1) choose 100 per page 2) select all 3) export citation 4) choose bibtex
ieee: 1) choose 100 per page 2) select all 3) export > citations > bibtex 4) copy/paste into ieee.bib file
wos: 1) select page 2) Export > Other file formats 3) Choose Bibtex and author,source,title
sd: 1) display 100 per page 2) Export > Export citations to bibtex
springer: 1) Download results as CSV 2) Open CSV in Excel and copy all DOIs 3) Paste DOIs into Zotero's "Add item by identifier" 4) Right-click selection and export all to bibtex
- SpringerLink does not support bibtex export and therefore, we followed a workaround described here. Notice that the import/matching could take a while until the indices and paper counts in the list are updated properly.
wiley: 1) Select all 2) Export citations > bibtex 3) repeat for all pages 4) merge all result files
scopus: 1) choose 100 per page 2) select all 3) Export > Bibtex

Grey Literature Search Queries

The query_grey directory contains all search results in the formats *.pdf and *.html. The following figure summarizes all sources:

For the 1) Web Search strategy, we use the following search string for all sources:

(serverless OR faas) AND (performance OR benchmark)

We perform an additional Google search with the exact same search string as used for academic literature but adjust the search string for more informal grey literature by omitting the keywords experiment and lambda.

Web Search Engines

We use the following web search engines:

ID	Search Engine	URL
google	Google Web Search	https://www.google.com/
twitter	Twitter Search	https://twitter.com/
hackernews	Hacker News Algolia Search	https://hn.algolia.com/
reddit	Reddit Search	https://www.reddit.com/search
medium	Medium Search	https://medium.com/search

Web Search Details

The following table summarizes the number of relevant studies and provides the exact query string and direct link for all web searches. The search was performed at 2019-10-21 and all results are available as ID.bib under ./data/query_academic. Notice that the number of relevant studies are already de-duplicated, meaning that we found 18 relevant studies through google1 search and the additional +7 studies from google2 search only include new non-duplicate studies. Notice that with the exception of Google Search, advanced queries including logical expressions (e.g., "OR") are not supported. Therefore, we manually compose four subqueries to implement an equivalent search string.

ID	Date	#	Exact Query String and Link
google1	2019/11/26	18	("serverless" OR "faas") AND ("performance" OR "benchmark") AND "experiment" AND "lambda" after:2015-01-01
google2	2019/11/26	+7	("serverless" OR "faas") AND ("performance" OR "benchmark") after:2015-01-01
twitter1	2019/12/03	+2	faas benchmark
twitter2	2019/12/03	+3	serverless benchmark
twitter3	2019/12/03	+0	faas performance
twitter4	2019/12/03	+3	serverless performance
hackernews1	2019/12/06	+0	faas benchmark
hackernews2	2019/12/06	+0	serverless benchmark
hackernews3	2019/12/06	+0	faas performance
hackernews4	2019/12/06	+1	serverless performance
reddit1	2019/12/06	+0	faas benchmark
reddit2	2019/12/06	+0	serverless benchmark
reddit3	2019/12/06	+3	faas performance
reddit4	2019/12/06	+0	serverless performance
medium1	2020/02/18	+0	faas benchmark
medium2	2020/02/18	+2	serverless benchmark
medium3	2020/02/18	+0	faas performance
medium4	2020/02/18	+1	serverless performance

Export Instructions

We used non-personalized private search mode through private Google Chrome browser windows wherever possible. Notice that the number of search results for Google search is only a rough estimate and typically changes (dramatically) when reaching the last page¹. Therefore, we used double quotes " for exact matching (i.e., exclude Google's fuzzy search results) and achieving more accurate search estimates. Further, Google filters out highly redundant search results by default. For the google1 query, we repeated the search with disabled redundancy filtering and kept both versions (e.g., google2 and google2.2 or google4.2 but google4 doesn't exist because omitted results have less pages)².

We used the Google Chrome export options for PDF and HTML in combination with the following steps:

google: 1) Paste link in private browser mode 2) Settings > Search Settings: choose region "United States" and 100 results per page
twitter: 1) Paste link in browser 2) scroll down to the bottom of the page until all search results appear
hackernews: 1) Paste link in browser 2) scroll down to the bottom of the page until all search results appear
reddit: 1) Paste link in browser 2) scroll down to the bottom of the page until all search results appear
medium: 1) Paste link in browser 2) scroll down to the bottom of the page until all search results appear

The authors also saved PDF or HTML files of all relevant articles in case some sources become unavailable. However, we cannot publish these website copies for legal reasons.

¹ Google Support: The count of the number of search results is incorrect
Search Engine Land: Why Google Can’t Count Results Properly

² Google Support: In order to show you the most relevant results, we have omitted some entries

Plots

An up-to-date R language toolchain preferably with RStudio is required. Install the required packages imported at the top of each *.R file (e.g., install.packages("ggplot2")) from the official CRAN package repository (RStudio automatically detects to-be-installed packages). A dependency installation script is provided under plots/install_dependencies.R

Open the RStudio project faas_mlr.Rproj (or alternatively set the R working directory to ./plots)
Run a given *.R file to produce the corresponding *.pdf plot. Example: characteristics.R produces characteristics.pdf. Example:
```
cd plots
Rscript characteristics.R
```

The plots follow the economist color scheme in the ggthemes package.

Dependencies

Software	Version
R	4.2.0
tidyr	1.1.0.9000 (dev)
vctrs	0.3.2.9000 (dev)
dplyr	1.0.1 (dev)
forcats	0.5.0
ggplot2	3.3.2
ggthemes	4.2.0

NOTE: In issue in vctrs caused error messages such as ... is not empty (2020-06) but was fixed in July and works with the above installed versions (dev versions as of 2020-07-19).

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
img		img
plots		plots
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Replication Package for "Function-as-a-Service Performance Evaluation: A Multivocal Literature Review"

Paper

Abstract

Citation

Dataset

Interactive GSheet

Academic Literature Search Queries

Manual Search for Academic Literature

Database Search

Query Motivation

Query Adaptations

Database Search Engines

Initial Search Details

Export Instructions

Grey Literature Search Queries

Web Search Engines

Web Search Details

Export Instructions

Plots

Dependencies

About

Uh oh!

Releases 3

Packages

Languages

License

joe4dev/faas-performance-mlr

Folders and files

Latest commit

History

Repository files navigation

Replication Package for "Function-as-a-Service Performance Evaluation: A Multivocal Literature Review"

Paper

Abstract

Citation

Dataset

Interactive GSheet

Academic Literature Search Queries

Manual Search for Academic Literature

Database Search

Query Motivation

Query Adaptations

Database Search Engines

Initial Search Details

Export Instructions

Grey Literature Search Queries

Web Search Engines

Web Search Details

Export Instructions

Plots

Dependencies

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages