Skip to content

Commit

Permalink
feat: add m3
Browse files Browse the repository at this point in the history
  • Loading branch information
katopz committed Dec 6, 2023
1 parent 13ae15d commit 1beca1f
Showing 1 changed file with 38 additions and 4 deletions.
42 changes: 38 additions & 4 deletions src/ml/wasmedge.md
Original file line number Diff line number Diff line change
@@ -1,49 +1,83 @@
# WasmEdge

## Only `wasmedge`
## Models

```bash
curl -LO https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/resolve/main/mistral-7b-instruct-v0.1.Q5_K_M.gguf
```

## M3Max

```bash
wasmedge --dir .:. --nn-preload default:GGML:AUTO:mistral-7b-instruct-v0.1.Q5_K_M.gguf llama-api-server.wasm -p chatml -r '<|im_end|>' -s 0.0.0.0:8081
```

RAM `4.1GB`

---

## Windows

> Ref: https://github.com/second-state/WasmEdge-WASINN-examples
```bash
wasmedge --dir .:. --env n_gpu_layers=35 --nn-preload default:GGML:AUTO:mistral-7b-instruct-v0.1.Q5_K_M.gguf wasmedge-ggml-llama-interactive.wasm default
```

RAM `7012MiB / 24564MiB`

## With `lama-chat`
### With `lama-chat`

> Ref: https://github.com/second-state/llama-utils
> mistral-7b-instruct-v0.1.Q5_K_M
```bash
curl -LO https://github.com/second-state/llama-utils/raw/main/chat/llama-chat.wasm

wasmedge --dir .:. --nn-preload default:GGML:AUTO:mistral-7b-instruct-v0.1.Q5_K_M.gguf llama-chat.wasm -p mistral-instruct-v0.1 -r '</s>'
```

RAM `9608MiB / 24564MiB`

> mistrallite.Q5_K_M
```bash
curl -LO https://github.com/second-state/llama-utils/raw/main/chat/llama-chat.wasm

wasmedge --dir .:. --nn-preload default:GGML:AUTO:mistrallite.Q5_K_M.gguf llama-chat.wasm -p mistrallite -r '</s>'
```

RAM `9608MiB / 24564MiB`

## With `llama-api-server`
### With `llama-api-server`

> mistrallite.Q5_K_M
```bash
curl -LO https://github.com/second-state/llama-utils/raw/main/api-server/llama-api-server.wasm

wasmedge --dir .:. --nn-preload default:GGML:AUTO:mistrallite.Q5_K_M.gguf llama-api-server.wasm -p mistrallite -r '</s>'
```

> openhermes-2.5-mistral-7b.Q5_K_M
```bash
curl -LO https://huggingface.co/second-state/OpenHermes-2.5-Mistral-7B-GGUF/resolve/main/openhermes-2.5-mistral-7b.Q5_K_M.gguf

wasmedge --dir .:. --nn-preload default:GGML:AUTO:openhermes-2.5-mistral-7b.Q5_K_M.gguf llama-api-server.wasm -p chatml -r '<|im_end|>'

# Or 8081
wasmedge --dir .:. --nn-preload default:GGML:AUTO:openhermes-2.5-mistral-7b.Q5_K_M.gguf llama-api-server.wasm -p chatml -r '<|im_end|>' -s 0.0.0.0:8081
```

## Test
### Test

```bash
curl -X POST http://localhost:8080/v1/chat/completions -H 'accept:application/json' -H 'Content-Type: application/json' -d '{"messages":[{"role":"system", "content": "You are a helpful assistant."}, {"role":"user", "content": "Write helloworld code in Rust"}], "model":"MistralLite-7B"}'

# Or 8081
curl -X POST http://localhost:8081/v1/chat/completions -H 'accept:application/json' -H 'Content-Type: application/json' -d '{"messages":[{"role":"system", "content": "You are a helpful assistant."}, {"role":"user", "content": "Write helloworld code in Rust"}], "model":"openhermes-2.5-mistral-7b.Q5_K_M"}'
```

RAM `23862MiB / 24564MiB`

0 comments on commit 1beca1f

Please sign in to comment.