Update Makefile to support MOE #1446

sfxworks · 2023-12-15T14:50:26Z

Need a newer version of llama.cpp to handle MoE models, such as Mixtral 8x7b

Description

This PR fixes #1421

Notes for Reviewers

Yes, I signed my commits.

In reference to ggml-org/llama.cpp#4406 Need a newer version of llama.cpp to handle MoE models, such as Mixtral 8x7b Signed-off-by: Samuel Walker <[email protected]>

netlify · 2023-12-15T14:50:31Z

✅ Deploy Preview for localai ready!

Name	Link
🔨 Latest commit	`76abeee`
🔍 Latest deploy log	https://app.netlify.com/sites/localai/deploys/657c67b46167290008f7df6f
😎 Deploy Preview	https://deploy-preview-1446--localai.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

sfxworks · 2023-12-15T14:51:38Z

Currently building and testing locally to confirm it works.

Using

#backend: llama
context_size: 8192
f16: true
low_vram: false
gpu_layers: 98
mmlock: false
name: mixtral
parameters:
  model: mixtral-8x7b-v0.1.Q4_K_M.gguf
  temperature: 0.2

sfxworks · 2023-12-15T15:02:43Z

I local-ai build info:
I BUILD_TYPE: hipblas
I GO_TAGS: 
I LD_FLAGS: -X "github.com/go-skynet/LocalAI/internal.Version=v2.0.0" -X "github.com/go-skynet/LocalAI/internal.Commit=238fec244ae6c9a66bc7fafd76c7e14671110a6f"
CGO_LDFLAGS="-L/opt/rocm/hip/lib -lamdhip64 -L/opt/rocm/lib -lOpenCL -L/usr/lib -lclblast -lrocblas -lhipblas -lrocrand -lomp -O3 --rtlib=compiler-rt -unwindlib=libgcc -lhipblas -lrocblas --hip-link -O3 --rtlib=compiler-rt -unwindlib=libgcc -lhipblas -lrocblas --hip-link" go build -ldflags "-X "github.com/go-skynet/LocalAI/internal.Version=v2.0.0" -X "github.com/go-skynet/LocalAI/internal.Commit=238fec244ae6c9a66bc7fafd76c7e14671110a6f"" -tags "" -o local-ai ./

10:02AM DBG GRPC(mixtral-8x7b-v0.1.Q4_K_M.gguf-127.0.0.1:42533): stderr error loading model: create_tensor: tensor 'blk.0.ffn_gate.weight' not found

Hmm, that doesn't appear to be where to change it.

sfxworks · 2023-12-15T15:06:06Z

Maybe gollama needs to be updated as well?

sfxworks · 2023-12-15T15:24:58Z

Tried with

GOLLAMA_VERSION?=77e691050c5401f03240f1960410e286fb50e8e2
CPPLLAMA_VERSION?=cafcd4f89500b8afef722cdb08088eceb8a22572

To reflect upstream update attempt go-skynet/go-llama.cpp#313 but failed

cd llama.cpp && patch -p1 < ../patches/1902-cuda.patch
patching file common/common.cpp
Hunk #1 succeeded at 1614 with fuzz 2 (offset 346 lines).
patching file common/common.h
Hunk #1 FAILED at 209.
1 out of 1 hunk FAILED -- saving rejects to file common/common.h.rej
make[1]: *** [Makefile:235: prepare] Error 1
make[1]: Leaving directory '/home/sam/LocalAI/sources/go-llama'
make: *** [Makefile:223: sources/go-llama/libbinding.a] Error 2

sfxworks · 2023-12-15T15:29:57Z

seems to be the same issue in their CI https://github.com/go-skynet/go-llama.cpp/actions/runs/7212854485/job/19651445657?pr=313

sfxworks · 2023-12-15T15:58:35Z

Trying against go-skynet/go-llama.cpp#315

sfxworks · 2023-12-15T16:05:46Z

hmm

11:07AM DBG GRPC(mixtral-8x7b-v0.1.Q4_K_M.gguf-127.0.0.1:44575): stderr error loading model: create_tensor: tensor 'blk.0.ffn_gate.weight' not found

mudler · 2023-12-15T16:21:16Z

did you tried with the llama-cpp backend? It should also be the default

sfxworks · 2023-12-15T16:37:22Z

did you tried with the llama-cpp backend? It should also be the default

I tried both with backend: llama commented and uncomment. No success. The last comment was with it uncommented.

mudler · 2023-12-15T17:06:02Z

did you tried without acceleration too?

sfxworks · 2023-12-15T23:56:47Z

Didn't try this model without acceleration, but I did try another model with acceleration and it worked just fine.

mudler · 2023-12-16T09:45:18Z

maybe it is an upstream issue, the llama-cpp backend is the most close to upstream one, if that fails something might be off with llama.cpp. master just got latest hash in #1429, I'll try to give it a go later today too

sfxworks · 2023-12-16T14:00:04Z

I'll give that a try today

mudler · 2023-12-16T15:10:14Z

Tried today and works locally, adding a full example in #1449

mudler · 2023-12-18T18:15:18Z

@sfxworks appreciate the effort here, but I think we can close this one as we have more up-to-date hashes in master, or is there anything pending? did you tried if mixtral works for you?

sfxworks · 2023-12-18T21:33:51Z

Yep! All works I appreciate it!

Update Makefile to support MOE

76abeee

In reference to ggml-org/llama.cpp#4406 Need a newer version of llama.cpp to handle MoE models, such as Mixtral 8x7b Signed-off-by: Samuel Walker <[email protected]>

sfxworks marked this pull request as draft December 15, 2023 14:50

sfxworks mentioned this pull request Dec 15, 2023

Build with >799a1cb13b0b1b560ab0ceff485caed68faa8f1f to support Mixtral go-skynet/go-llama.cpp#314

Open

sfxworks closed this Dec 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Makefile to support MOE #1446

Update Makefile to support MOE #1446

sfxworks commented Dec 15, 2023

netlify bot commented Dec 15, 2023 •

edited

Loading

sfxworks commented Dec 15, 2023

sfxworks commented Dec 15, 2023

sfxworks commented Dec 15, 2023

sfxworks commented Dec 15, 2023

sfxworks commented Dec 15, 2023

sfxworks commented Dec 15, 2023

sfxworks commented Dec 15, 2023 •

edited

Loading

mudler commented Dec 15, 2023 •

edited

Loading

sfxworks commented Dec 15, 2023

mudler commented Dec 15, 2023

sfxworks commented Dec 15, 2023

mudler commented Dec 16, 2023

sfxworks commented Dec 16, 2023

mudler commented Dec 16, 2023

mudler commented Dec 18, 2023

sfxworks commented Dec 18, 2023

Update Makefile to support MOE #1446

Update Makefile to support MOE #1446

Conversation

sfxworks commented Dec 15, 2023

netlify bot commented Dec 15, 2023 • edited Loading

✅ Deploy Preview for localai ready!

sfxworks commented Dec 15, 2023

sfxworks commented Dec 15, 2023

sfxworks commented Dec 15, 2023

sfxworks commented Dec 15, 2023

sfxworks commented Dec 15, 2023

sfxworks commented Dec 15, 2023

sfxworks commented Dec 15, 2023 • edited Loading

mudler commented Dec 15, 2023 • edited Loading

sfxworks commented Dec 15, 2023

mudler commented Dec 15, 2023

sfxworks commented Dec 15, 2023

mudler commented Dec 16, 2023

sfxworks commented Dec 16, 2023

mudler commented Dec 16, 2023

mudler commented Dec 18, 2023

sfxworks commented Dec 18, 2023

netlify bot commented Dec 15, 2023 •

edited

Loading

sfxworks commented Dec 15, 2023 •

edited

Loading

mudler commented Dec 15, 2023 •

edited

Loading