Skip to content

Commit

Permalink
Update llm_inference.md
Browse files Browse the repository at this point in the history
Signed-off-by: Michael Yuan <[email protected]>
  • Loading branch information
juntao authored and ezirmusitua committed Nov 19, 2023
1 parent 1e9718e commit cf6c601
Showing 1 changed file with 4 additions and 12 deletions.
16 changes: 4 additions & 12 deletions docs/develop/rust/wasinn/llm_inference.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,18 +60,10 @@ Second, use `cargo` to build the example project.
cargo build --target wasm32-wasi --release
```

The output WASM file is `target/wasm32-wasi/release/llama-chat.wasm`.

We also need to get the model. Here we use the llama-2-13b model.

```bash
curl -LO https://huggingface.co/wasmedge/llama2/blob/main/llama-2-13b-chat-q5_k_m.gguf
```

Next, use WasmEdge to load the llama-2-13b model and then ask the model to questions.
The output WASM file is `target/wasm32-wasi/release/llama-chat.wasm`. Next, use WasmEdge to load the llama-2-7b model and then ask the model to questions.

```bash
wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-13b-chat-q5_k_m.gguf llama-chat.wasm
wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-7b-chat-q5_k_m.gguf llama-chat.wasm
```

After executing the command, you may need to wait a moment for the input prompt to appear. You can enter your question once you see the `[USER]:` prompt:
Expand Down Expand Up @@ -158,7 +150,7 @@ You can make the inference program run faster by AOT compiling the wasm file fir
```bash
wasmedge compile llama-chat.wasm llama-chat.wasm
wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-13b-q5_k_m.gguf llama-chat.wasm
wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-7b-chat-q5_k_m.gguf llama-chat.wasm
```
## Understand the code
Expand Down Expand Up @@ -260,7 +252,7 @@ Next, execute the model inference.
context.compute().expect("Failed to complete inference");
```
After the inference is fiished, extract the result from the computation context and losing invalid UTF8 sequences handled by converting the output to a string using `String::from_utf8_lossy`.
After the inference is finished, extract the result from the computation context and losing invalid UTF8 sequences handled by converting the output to a string using `String::from_utf8_lossy`.
```rust
let mut output_buffer = vec![0u8; *CTX_SIZE.get().unwrap()];
Expand Down

0 comments on commit cf6c601

Please sign in to comment.