Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apple Silicon Metal Support not working #91

Open
soleblaze opened this issue Jun 8, 2023 · 4 comments
Open

Apple Silicon Metal Support not working #91

soleblaze opened this issue Jun 8, 2023 · 4 comments

Comments

@soleblaze
Copy link
Contributor

When I try modifying the example to add llama.SetGPULayers(1) it doesn't appear to set that. The example is still using CPU without offloading to metal.

When I use it in local-ai it thinks that my q4_0 model is a f32 model. The same model works fine running llama.cpp directly.

Asserting on type 0
GGML_ASSERT: /Users/soleblaze/git/thirdparty/localai/go-llama/llama.cpp/ggml-metal.m:549: false && "not implemented"

You do need to copy the ggml-metal.metal file from the llama.cpp directory to your CWD for this to work. Otherwise it errors out with a can't find file (null).

Is there a different load path that go-llama.cpp should be using when loading a model in for metal?

@mudler
Copy link
Member

mudler commented Jun 8, 2023

When trying with the bindings are you following the steps in the readme?

BUILD_TYPE=metal make libbinding.a
CGO_LDFLAGS="-framework Foundation -framework Metal -framework MetalKit -framework MetalPerformanceShaders" LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD go run ./examples -m "/model/path/here" -t 14

@soleblaze
Copy link
Contributor Author

soleblaze commented Jun 9, 2023

Yes. With that example it still appears to use cpu. llama expects -ngl 1 to be passed to it in order to use metal.

using the steps in the readme also outputs this error:
make: Circular llama.cpp/ggml-metal.o <- llama.cpp/ggml-metal.o dependency dropped.

When it uses metal it should output lines prefixed with ggml_metal_init: when it uses metal. It doesn't do this. If I clone the repo, cd into llama.cpp, and run LLAMA_METAL=1 make prior to building the libbinding.a then it works correctly.

Adding llama.SetGPULayers(1) to the llama.New call on line 33 allows the example to use metal. This fails to run due to a missing ggml-metal.metal file.

example output metal failure
❯ CGO_LDFLAGS="-framework Foundation -framework Metal -framework MetalKit -framework MetalPerformanceShaders" LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD go run ./examples -m "$HOME/models/ggml-model-q4_0.bin" -t 6
llama.cpp: loading model from /Users/soleblaze/models/ggml-model-q4_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 128
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.07 MB
llama_model_load_internal: mem required  = 5407.71 MB (+ 1026.00 MB per state)
.
llama_init_from_file: kv self size  =   64.00 MB
ggml_metal_init: allocating
ggml_metal_init: using MPS
ggml_metal_init: loading '(null)'
ggml_metal_init: error: Error Domain=NSCocoaErrorDomain Code=258 "The file name is invalid."
exit status 1

if I compile the examples/main.go and copy llama.cpp/ggml-metal.metal into my CWD then it works. running the example via go run fails to find the file.

@tmc
Copy link

tmc commented Jun 11, 2023

FWIW I'm also not seeing GPU offload with -ngl 1 and from following the current README steps to build+use metal support.

nogpu

@soleblaze
Copy link
Contributor Author

That's odd. it's working fine for me. When it loads the model do you get any of the ggml_metal_init lines? I do get sporadic gpu usage where it freezes every X amount of tokens and the gpu utilization dives.
Screenshot 2023-06-12 at 3 58 54 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants