Skip to content

Commit

Permalink
📝 Make README more readable
Browse files Browse the repository at this point in the history
  • Loading branch information
Shark committed Aug 25, 2022
1 parent 95419b8 commit 755ae9f
Show file tree
Hide file tree
Showing 5 changed files with 201 additions and 159 deletions.
191 changes: 34 additions & 157 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
<h3 align="center">wasm-workflows-plugin</h3>

<p align="center">
An <a href="https://github.com/argoproj/argo-workflows/blob/master/docs/executor_plugins.md">Executor Plugin</a> for <a href="https://argoproj.github.io/argo-workflows/">Argo Workflows</a> that runs WebAssembly modules! 🚀
Runs WebAssembly in your Argo Workflows! 🚀
<br />
<a href="https://github.com/Shark/wasm-workflows-plugin/#about-the-project"><strong>Find out why that's awesome »</strong></a>
<!--
Expand Down Expand Up @@ -47,108 +47,61 @@

## About The Project

This is a tool that allows you run WebAssembly modules instead of containers for your steps in [Argo Workflows](https://argoproj.github.io/argo-workflows/). You might rightfully ask yourself what problem this solves for you.
This is an <a href="https://github.com/argoproj/argo-workflows/blob/master/docs/executor_plugins.md">Executor Plugin</a> for <a href="https://argoproj.github.io/argo-workflows/">Argo Workflows</a> that runs WebAssembly modules!

The two most important aspects are security and performance:
These are the benefits of using Wasm instead of Docker containers in your workflows:

* :lock: **Security**
* :airplane: **Portability**

The [list of things to do](https://cheatsheetseries.owasp.org/cheatsheets/Docker_Security_Cheat_Sheet.html) when you want to run containers securely is long and the topic is more complex than even ambitious users care about. Containers are [vulnerable in many ways](https://ieeexplore.ieee.org/document/8693491) because of their denylist approach to security: they're allowed to do many things by default.
You must build Docker containers individually for every CPU architecture. Working on a Mac with Apple Silicon, but your Kubernetes nodes run on Intel CPUs? You'll cross-compile your container images all day.

WebAssembly's security model is the opposite. As with smartphone apps, they must be given permission for potentially infringing tasks. For example, you might want to give a module the permission to read and write files but not communicate over the internet.
Wasm modules are architecture-independent by design. Build once, run everywhere.

Container images from third parties you don't know are usually a security nightmare. With WebAssembly, you can run code you don't fully trust with more confidence. Say you have a workflow step that renders Markdown. When the author of your Markdown parser container image decides to deliver a crypto miner instead, most Kubernetes setups will happily run it. If you were using this project and a WebAssembly module: zero chance, since it's easy for you to know that the step doesn't need the network but only takes an input parameter and produces some output. [This example is not made up](https://www.trendmicro.com/vinfo/fr/security/news/virtualization-and-cloud/malicious-docker-hub-container-images-cryptocurrency-mining).
* :runner: **Performance**

<details>
<summary>More about the difference between containers and Wasm modules</summary>
<img src="doc/container-vs-wasm.png" style="max-width: 700px">
<p>Linux processes use more than 300 system calls for any task that involves sharing data with outside of a process. Containers are a combination of different Linux Kernel technologies (namespaces, cgroups etc.) that segment one computer into many seemlingly independent containers. But this very much depends on a) the secure implementation of all syscalls not to leak anything and b) trust in the application inside the container to do what the user intends it to.</p>
<p>Wasm modules are very restricted by default. We use application-level capabilities to allow them to access external resources like the network, S3 object stores, or the filesystem. The modules are the capability consumers, the Wasm runtime is the capability provider. The capability provider translates the requests from the Wasm module and acts as a secure proxy to the outside.</p>
</details>
It takes a while for Kubernetes to spin up a container and run your code. The process has quite a few steps: pulling a container image, often 100s of megabytes in size, creating namespaces and virtual network interfaces. Starting the runtime for interpreted languages takes a while, too.

* :runner: **Performance**
Wasm does not emulate a complete operating system as containers do. They are a much simpler abstraction. This means that a module executes in a matter of milliseconds.

* :lock: **Security**

Containers have some overhead: for each workflow step, Argo creates new Kubernetes Pod. This Pod has several containers to enable all the Argo features, your code is just one of them. All the containers must execute, then results are gathered and sent back to Argo. This all takes time: container images are often towards 100s of megabytes, they may rely on interpreted languages like Python or have huge dependencies leading to a slow start time. You may know the [Cold Start issue](https://aws.amazon.com/blogs/compute/operating-lambda-performance-optimization-part-1/) with Function-as-a-Service. In Argo, every workflow step is a cold start.
Securing a container runtime [is a challenge](https://cheatsheetseries.owasp.org/cheatsheets/Docker_Security_Cheat_Sheet.html) because [containers are vulnerable in many ways](https://ieeexplore.ieee.org/document/8693491). Containers are powerful by design.

Because WebAssembly modules don't have to bring a whole operating system, they're much smaller. And there is less setup work to do, even for interpreted languages. This means that a module can be run in a matter of milliseconds rather than tens of seconds.
Wasm is a minimal runtime that is just powerful enough to run a program. Rather than allowing everything by default, its security works more like on a smartphone, where you give apps permissions explicitly.

WebAssembly is a new technology in the browser and even more so in the Cloud-Native ecosystem. But there are several ways to ease the transition:
[Read more about the benefits here.](doc/benefits.md)

* Containers and Wasm modules play together just fine in the same workflow. Existing container setups don't have to be migrated just because they could. Use Wasm for the tasks for which is a good fit and leave the rest to containers.
Even though Wasm is a new technology in Cloud Native, incorporating Wasm into your workflow is seamless:

* You can find ready-to-use templates for popular programming languages in the [`wasm-modules/templates/`](wasm-modules/templates/) folder.
* Containers and Wasm modules co-exist in the same workflow. You can pass artifacts and parameters between them.

* We will provide pre-made modules for popular use cases such as image and text processing, API connectors, etc.
* We have included [ready-to-use templates](wasm-modules/templates/), [examples](wasm-modules/examples/), and even [some useful modules for running off-the-shelf](wasm-modules/contrib/).

### Built with

Open Source software stands on the shoulders of giants. It wouldn't have been possible to build this tool with very little extra work without the authors of these lovely projects below.
Open Source software stands on the shoulders of giants. It wouldn't have been possible to build this tool without the authors of these projects:

* [Rust](https://rust-lang.org) is used to implement the Argo Executor Plugin API, pull and execute Wasm modules
* [Axum](https://github.com/tokio-rs/axum) is the Rust web framework to handle RPC calls
* [Wasmtime](https://github.com/bytecodealliance/wasmtime) is the WebAssembly Virtual Machine ([WASI](https://wasi.dev) is [supported](https://crates.io/crates/wasmtime-wasi), too)
* [wit-bindgen](https://github.com/bytecodealliance/wit-bindgen) provides the interface between this project as the Wasm host and the Wasm modules
* [Wasmtime](https://github.com/bytecodealliance/wasmtime) is the WebAssembly Virtual Machine with [WASI support](https://wasi.dev)
* [oci-distribution](https://crates.io/crates/oci-distribution) allows the tool to pull Wasm modules from OCI registries
* [Best README Template](https://github.com/othneildrew/Best-README-Template)

## Getting Started

### Prerequisites

* This guide assumes you have a working Argo Workflows installation with v3.3.0 or newer.
* You will need to install the [Argo CLI](https://argoproj.github.io/argo-workflows/cli/) with v3.3.0 or newer.
* `kubectl` must be available and configured to access the Kubernetes cluster where Argo Workflows is installed.

### Installation

1. Clone the repository and change to the [`argo-plugin/`](argo-plugin/) directory:

```shell
git clone https://github.com/Shark/wasm-workflows-plugin
cd wasm-workflows-plugin/argo-plugin
```
You must install Argo Workflows (v3.3.0 or newer) and the [`argo` CLI](https://argoproj.github.io/argo-workflows/cli/). `kubectl` needs access to your cluster.

1. Build the plugin ConfigMap:
**Install the plugin:**

```shell
argo executor-plugin build .
```
Go to the [Releases page](https://github.com/Shark/wasm-workflows-plugin.git) and follow the descriptions for installing the plugin through the ConfigMap.

1. Register the plugin with Argo in your cluster:
**Submit your first Wasm workflow:**

Ensure to specify `--namespace` if you didn't install Argo in the default namespace.
Run `argo submit --watch https://raw.githubusercontent.com/Shark/wasm-workflows-plugin/main/wasm-modules/examples/ferris-says/workflow.yaml`.

```shell
kubectl apply -f wasm-executor-plugin-configmap.yaml
```
Add `--namespace XYZ` if your Argo installation is not running in the default namespace.

## Usage

Now that the plugin is registered in Argo, you can run workflow steps as Wasm modules by simply calling the `wasm` plugin:

```yaml
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: wasm-
spec:
entrypoint: wasm
arguments:
parameters:
- name: text
value: Hello World from WebAssembly
templates:
- name: wasm
inputs:
parameters:
- name: text
plugin:
wasm:
module:
oci: ghcr.io/shark/wasm-workflows-plugin-example-ferris-says:latest
```
The `wasm` template will produce an output parameter `text` with an awesome message:
The workflow produces an output parameter `text` with a cool message:

```
___________________
Expand All @@ -163,103 +116,27 @@ The `wasm` template will produce an output parameter `text` with an awesome mess
/ '-----' \
```

Input and output parameters between workflow steps work just like you'd expect. Other features like artifacts may still be on the [roadmap](#roadmap) though, which is advised to check for your use case.
### Module Development

Creating a new Wasm module is easy and works with every language.
There are ready-to-use templates for:
* [AssemblyScript](wasm-modules/templates/assemblyscript/)
* [Rust](wasm-modules/templates/rust/)
* [TinyGo](wasm-modules/templates/tinygo/)
You implement a [WASI](https://wasi.dev) module. WASI is a modular system interface for Wasm. The principle is easy: the module is given its input in a file at `/work/input.json`. It is expected to write its results to a file at `/work/result.json` and exit.
We created an easy-to-use wrapper for Rust. The wrapper abstracts all the file handling magic and lets you implement a function with a signature like this:
Creating a new Wasm module for use with Argo Workflows is described in the [Module Development Guide](doc/module-development.md).

```rust
fn run(invocation: PluginInvocation) -> anyhow::Result<PluginResult> {
// This is where your code goes
PluginResult {
phase: Phase::Succeeded,
message: "Done".to_string(),
outputs: Default::default(),
}
}
```

For any other language you can easily parse the JSON yourself:

* PluginInvocation: [Example](crates/workflow-model/doc/plugin-invocation.example.json), [Schema](crates/workflow-model/doc/plugin-invocation.schema.json)
* PluginResult: [Example](crates/workflow-model/doc/plugin-result.example.json), [Schema](crates/workflow-model/doc/plugin-result.schema.json)

### Capabilities

Capabilities expand what modules can do. Out of the box, modules can take input parameters and artifacts and produce some output. Take a look at the [capabilities for wasmCloud](https://wasmcloud.dev/reference/host-runtime/capabilities/) for a more complete list of useful capabilities. The capabilities that this plugin offers will be extended in the future.

#### HTTP Capability

The HTTP capability provider allows you to make HTTP requests from your Wasm module. The capability is available in every module mode. Please refer to the [`wasi-experimental-http`](https://github.com/deislabs/wasi-experimental-http) repository for complete information of how to access the HTTP capability from your module. There you will find examples for both Rust and AssemblyScript.

When using the HTTP capability, you need to whitelist the hosts that the module is allowed to connect to. This illustrates the ease-of-use that WebAssembly's capability-oriented security model offers: for you, it's very easy to tell if a module should be able to connect outside – and now securing your code got easy.

You can find a full-featured module at [`wasm-modules/contrib/http-request`](wasm-modules/contrib/http-request/).

The module supports the following input parameters:

* `url`: the URL that you want to call
* `method`: HTTP request method (e.g. `GET`, `POST`, etc.) – optional, defaults to `GET`
* `body`: HTTP request body as a string – optional
* `content_type`: HTTP request body content type (e.g. `application/json`) – optional

As a result, you get the following output parameters:

* `status_code`: HTTP response status code as a number (e.g. `200`)
* `body`: HTTP response body as a string
* `content_type`: HTTP response body content type (e.g. `text/plain`)

The `http-request` module can be used in a workflow like so:

```yaml
- name: wasm
inputs:
parameters:
- name: url
value: https://httpbin.org/post
- name: method
value: POST
- name: body
value: Hello World
- name: content_type
value: text/plain
plugin:
wasm:
module:
oci: ghcr.io/shark/wasm-workflows-plugin-http-request:latest
permissions:
http:
allowed_hosts:
- https://httpbin.org
```
### Advanced Features

### Execution Modes
* **Distributed Execution**

The plugin has two modes of how it can execute a Wasm module.
The plugin will run Wasm modules within the plugin process by default. This is the recommended mode because it's easy to set up and is powerful enough for most scenarios.

The `local` mode is the default and recommended mode. It will run Wasm modules within the plugin process. Argo will create one plugin container per workflow instance. This is fine for most use cases that don't need infinite scaling within a workflow. It's also very easy to use because there is nothing to configure: it just works.
The distributed mode creates pods for Wasm modules in a workflow task, much like Argo does for Docker containers.

The :test_tube: `distributed` mode is more advanced. It is provided as a technical prototype. This mode orchestrates Wasm modules in a Kubernetes cluster much like Argo itself. It creates a Pod for each workflow task. The Pod is executed by a virtual Kubernetes node that is provided by Krustlet. [Read more about :test_tube: `distributed` mode](doc/distributed-mode.md).
[Read more in the Distributed Execution Guide.](doc/distributed-mode.md)

## Roadmap

Our roadmap is managed on the [*Developing wasm-workflows-plugin* GitHub project board](https://github.com/users/Shark/projects/1/views/1).
We manage our roadmap on the [*Developing wasm-workflows-plugin* GitHub project board](https://github.com/users/Shark/projects/1/views/1).

## Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are **greatly appreciated**.
Contributions make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are **greatly appreciated**.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement".
Don't forget to give the project a star! Thanks again!
Expand Down
22 changes: 22 additions & 0 deletions doc/benefits.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Benefits

* :lock: **Security**

The [list of things to do](https://cheatsheetseries.owasp.org/cheatsheets/Docker_Security_Cheat_Sheet.html) when you want to run containers securely is long and the topic is more complex than even ambitious users care about. Containers are [vulnerable in many ways](https://ieeexplore.ieee.org/document/8693491) because of their denylist approach to security: they're allowed to do many things by default.

WebAssembly's security model is the opposite. As with smartphone apps, they must be given permission for potentially infringing tasks. For example, you might want to give a module the permission to read and write files but not communicate over the internet.

Container images from third parties you don't know are usually a security nightmare. With WebAssembly, you can run code you don't fully trust with more confidence. Say you have a workflow step that renders Markdown. When the author of your Markdown parser container image decides to deliver a crypto miner instead, most Kubernetes setups will happily run it. If you were using this project and a WebAssembly module: zero chance, since it's easy for you to know that the step doesn't need the network but only takes an input parameter and produces some output. [This example is not made up](https://www.trendmicro.com/vinfo/fr/security/news/virtualization-and-cloud/malicious-docker-hub-container-images-cryptocurrency-mining).

<details>
<summary>More about the difference between containers and Wasm modules</summary>
<img src="doc/container-vs-wasm.png" style="max-width: 700px">
<p>Linux processes use more than 300 system calls for any task that involves sharing data with outside of a process. Containers are a combination of different Linux Kernel technologies (namespaces, cgroups etc.) that segment one computer into many seemlingly independent containers. But this very much depends on a) the secure implementation of all syscalls not to leak anything and b) trust in the application inside the container to do what the user intends it to.</p>
<p>Wasm modules are very restricted by default. We use application-level capabilities to allow them to access external resources like the network, S3 object stores, or the filesystem. The modules are the capability consumers, the Wasm runtime is the capability provider. The capability provider translates the requests from the Wasm module and acts as a secure proxy to the outside.</p>
</details>

* :runner: **Performance**

Containers have some overhead: for each workflow step, Argo creates new Kubernetes Pod. This Pod has several containers to enable all the Argo features, your code is just one of them. All the containers must execute, then results are gathered and sent back to Argo. This all takes time: container images are often towards 100s of megabytes, they may rely on interpreted languages like Python or have huge dependencies leading to a slow start time. You may know the [Cold Start issue](https://aws.amazon.com/blogs/compute/operating-lambda-performance-optimization-part-1/) with Function-as-a-Service. In Argo, every workflow step is a cold start.

Because WebAssembly modules don't have to bring a whole operating system, they're much smaller. And there is less setup work to do, even for interpreted languages. This means that a module can be run in a matter of milliseconds rather than tens of seconds.
Loading

0 comments on commit 755ae9f

Please sign in to comment.