Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better Support for AMD and ROCM via docker containers. #1592

Closed
Tracked by #1126
jamiemoller opened this issue Jan 15, 2024 · 78 comments
Closed
Tracked by #1126

Better Support for AMD and ROCM via docker containers. #1592

jamiemoller opened this issue Jan 15, 2024 · 78 comments
Labels
area/container enhancement New feature or request high prio roadmap up for grabs Tickets that no-one is currently working on

Comments

@jamiemoller
Copy link

Presently it is very hard to get a docker container to build with the rocm backend, some elements seem to fail independently during the build process.
There are other related projects with functional docker implementations that do work with rocm out of the box (aka llama.cpp).
I would like to work on this myself however between the speed at which things change in this project and the amount of time I have free to work on this, I am left only to ask for this.

If there are good 'stable' methods for building a docker implementation with rocm underneath already it would be very appreciated if this could be better documented. 'arch' helps nobody that wants to run on a more enterprisy os like rhel or sles.

Presently I have defaulted back to using textgen as it has a mostly functional api but its featureset is kinda woeful. (better than running llama.cpp directly imo)

@jamiemoller jamiemoller added the enhancement New feature or request label Jan 15, 2024
@jamiemoller
Copy link
Author

ps. love the work @mudler

@jamiemoller
Copy link
Author

it should be noted
1 - the documentation for rocm for some reason indicates make BUILD_TYPE=hipblas GPU_TARGETS=gfx1030 ... there is no build arg
2 - stablediffusion is the hardest thing to get working in any environment ive tested. as in i have yet to actually get it to build on arch, deb, or opensuse
3 - the following dockerfile is the smoothest ive had it build so far

FROM archlinux

# Install deps
# ncnn not required as stablediffusion build is broken
RUN pacman -Syu --noconfirm
RUN pacman -S --noconfirm base-devel git rocm-hip-sdk rocm-opencl-sdk opencv clblast grpc go ffmpeg ncnn

# Configure Lib links
ENV CGO_CFLAGS="-I/usr/include/opencv4" \
    CGO_CXXFLAGS="-I/usr/include/opencv4" \
    CGO_LDFLAGS="-L/opt/rocm/hip/lib -lamdhip64 -L/opt/rocm/lib -lOpenCL -L/usr/lib -lclblast -lrocblas -lhipblas -lrocrand -lomp -O3 --rtlib=compiler-rt -unwindlib=libgcc -lhipblas -lrocblas --hip-link"

# Configure Build settings
ARG BUILD_TYPE="hipblas"
ARG GPU_TARGETS="gfx906" # selected for RadeonVII
ARG GO_TAGS="tts" # stablediffusion is broken

# Build
RUN git clone https://github.com/go-skynet/LocalAI
WORKDIR /LocalAI
RUN make BUILD_TYPE=${BUILD_TYPE} GPU_TARGETS=${GPU_TARGETS} GO_TAGS=${GO_TAGS} build

# Clean up
RUN pacman -Scc --noconfirm

@jamiemoller
Copy link
Author

it should be noted that while i do see models load onto the card whenever there is an api call and there are computations being performed pushing the card to 200W of consumption there is never any return from the api call and the apparent inference never terminates

@mudler
Copy link
Owner

mudler commented Jan 16, 2024

Presently it is very hard to get a docker container to build with the rocm backend, some elements seem to fail independently during the build process. There are other related projects with functional docker implementations that do work with rocm out of the box (aka llama.cpp). I would like to work on this myself however between the speed at which things change in this project and the amount of time I have free to work on this, I am left only to ask for this.

I don't have an AMD card to test, so this card is up-for-grabs.

Things are moving fast, right, but building-wise this is a good time window, there aren't plans to do changes in that code area in the short-term.

If there are good 'stable' methods for building a docker implementation with rocm underneath already it would be very appreciated if this could be better documented. 'arch' helps nobody that wants to run on a more enterprisy os like rhel or sles.

A good starting point would be in this section:

RUN if [ "${IMAGE_TYPE}" = "extras" ]; then \
we can pull RocM dependencies in there if the appropriate flag was passed by

@mudler mudler removed their assignment Jan 16, 2024
@mudler mudler added the up for grabs Tickets that no-one is currently working on label Jan 16, 2024
@wuxxin
Copy link
Contributor

wuxxin commented Jan 16, 2024

@jamiemoller you could use https://github.com/wuxxin/aur-packages/blob/main/localai-git/PKGBUILD as a starting point, its a (feature limited) archlinux package of localai for CPU, CUDA and ROCM. There are binaries available via arch4edu. See #1437

@Expro
Copy link

Expro commented Jan 31, 2024

Please do work on that. I'm trying to put any load on AMD GPU for week now. Building from source on Ubuntu for clBlast fails in so many ways it's not even funny.

@jamiemoller
Copy link
Author

jamiemoller commented Feb 14, 2024

i have a feeling that it will be better to start from here (or something)
for amd builds now that 2.8 is on the ubu22.04

@mudler
Copy link
Owner

mudler commented Feb 15, 2024

did some progress on #1595 (thanks to @fenfir to have started this up) but I don't have an AMD video card, however CI seems to pass and container images are being built just fine.

I will merge as soon as the v2.8.2 images are out - @jamiemoller @Expro could you give the images a shot as soon as they are on master?

@Expro
Copy link

Expro commented Feb 16, 2024

Sure, I will take them for spin. Thanks for working on that.

@mudler
Copy link
Owner

mudler commented Feb 17, 2024

hipblas images are pushed by now:

quay.io/go-skynet/local-ai:master-hipblas-ffmpeg-core 

@Expro
Copy link

Expro commented Feb 20, 2024

Unfortunately, not working as intended. GPU was detected, but nothing was offloaded:

4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr ggml_init_cublas: found 1 ROCm devices: 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr Device 0: AMD Radeon (TM) Pro VII, compute capability 9.0, VMM: no 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: loaded meta data with 20 key-value pairs and 325 tensors from /build/models/c0c3c83d0ec33ffe925657a56b06771b (version GGUF V3 (latest)) 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - kv 0: general.architecture str = phi2 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - kv 1: general.name str = Phi2 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - kv 2: phi2.context_length u32 = 2048 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - kv 3: phi2.embedding_length u32 = 2560 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - kv 4: phi2.feed_forward_length u32 = 10240 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - kv 5: phi2.block_count u32 = 32 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - kv 6: phi2.attention.head_count u32 = 32 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - kv 7: phi2.attention.head_count_kv u32 = 32 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - kv 8: phi2.attention.layer_norm_epsilon f32 = 0.000010 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - kv 9: phi2.rope.dimension_count u32 = 32 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - kv 10: general.file_type u32 = 7 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - kv 11: tokenizer.ggml.add_bos_token bool = false 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - kv 12: tokenizer.ggml.model str = gpt2 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,51200] = ["!", "\"", "#", "$", "%", "&", "'", ... 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,51200] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - kv 15: tokenizer.ggml.merges arr[str,50000] = ["Ġ t", "Ġ a", "h e", "i n", "r e",... 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 50256 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 50256 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 50256 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - kv 19: general.quantization_version u32 = 2 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - type f32: 195 tensors 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_model_loader: - type q8_0: 130 tensors 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_vocab: mismatch in special tokens definition ( 910/51200 vs 944/51200 ). 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: format = GGUF V3 (latest) 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: arch = phi2 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: vocab type = BPE 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: n_vocab = 51200 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: n_merges = 50000 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: n_ctx_train = 2048 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: n_embd = 2560 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: n_head = 32 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: n_head_kv = 32 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: n_layer = 32 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: n_rot = 32 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: n_embd_head_k = 80 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: n_embd_head_v = 80 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: n_gqa = 1 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: n_embd_k_gqa = 2560 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: n_embd_v_gqa = 2560 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: f_norm_eps = 1.0e-05 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: f_norm_rms_eps = 0.0e+00 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: f_clamp_kqv = 0.0e+00 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: f_max_alibi_bias = 0.0e+00 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: n_ff = 10240 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: n_expert = 0 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: n_expert_used = 0 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: rope scaling = linear 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: freq_base_train = 10000.0 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: freq_scale_train = 1 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: n_yarn_orig_ctx = 2048 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: rope_finetuned = unknown 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: model type = 3B 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: model ftype = Q8_0 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: model params = 2.78 B 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: model size = 2.75 GiB (8.51 BPW) 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: general.name = Phi2 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: BOS token = 50256 '<|endoftext|>' 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: EOS token = 50256 '<|endoftext|>' 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: UNK token = 50256 '<|endoftext|>' 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_print_meta: LF token = 128 'Ä' 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_tensors: ggml ctx size = 0.12 MiB 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_tensors: offloading 0 repeating layers to GPU 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_tensors: offloaded 0/33 layers to GPU 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llm_load_tensors: ROCm_Host buffer size = 2819.28 MiB 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr ............................................................................................. 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_new_context_with_model: n_ctx = 512 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_new_context_with_model: freq_base = 10000.0 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_new_context_with_model: freq_scale = 1 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_kv_cache_init: ROCm_Host KV buffer size = 160.00 MiB 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_new_context_with_model: KV self size = 160.00 MiB, K (f16): 80.00 MiB, V (f16): 80.00 MiB 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_new_context_with_model: ROCm_Host input buffer size = 6.01 MiB 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_new_context_with_model: ROCm_Host compute buffer size = 115.50 MiB 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr llama_new_context_with_model: graph splits (measure): 1 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr Available slots: 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr -> Slot 0 - max context: 512 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr all slots are idle and system prompt is empty, clear the KV cache 4:14PM INF [llama-cpp] Loads OK 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr slot 0 is processing [task id: 0] 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr slot 0 : kv cache rm - [0, end) 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr CUDA error: shared object initialization failed 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr current device: 0, in function ggml_cuda_op_mul_mat at /build/backend/cpp/llama/llama.cpp/ggml-cuda.cu:9462 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr hipGetLastError() 4:14PM DBG GRPC(c0c3c83d0ec33ffe925657a56b06771b-127.0.0.1:41425): stderr GGML_ASSERT: /build/backend/cpp/llama/llama.cpp/ggml-cuda.cu:241: !"CUDA error"

Tested with integrated phi-2 model with gpu_layers specified:

`
name: phi-2
context_size: 2048
f16: true
gpu_layers: 90
mmap: true
trimsuffix:

  • "\n"
    parameters:
    model: huggingface://TheBloke/phi-2-GGUF/phi-2.Q8_0.gguf
    temperature: 0.2
    top_k: 40
    top_p: 0.95
    seed: -1
    template:
    chat: &template |
    Instruct: {{.Input}}
    Output:
    completion: *template

usage: |
To use this model, interact with the API (in another terminal) with curl for instance:
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "phi-2",
"messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}]
}'
`

@jtwolfe
Copy link
Contributor

jtwolfe commented Mar 2, 2024

the rocm docker image does appear to load the model however there is a grpc error that I have encountered that causes the call to terminate before inference, i am moving to 22.04 with rocm 6.0.0 on the host make sure there are no version compatibility issues.

Note: the new vulkan implementation of llama.cpp seems to work flawlessly

@derzahla
Copy link

derzahla commented Apr 2, 2024

Im trying to work on the hipblas version but I am confused on where the Dockerfiles are located that are used to generate the latest images such as "quay.io/go-skynet/local-ai:master-hipblas" . One thing I noticed is that the latest hipblas images are still using rocm v6.0.0 while v6.0.3 is now out. But I have been unable to locate a Dockerfile in the git repo that is installing any version of rocm. So it would appear the Dockerfle being used is hosted elsewhere?

Would appreciate if someone could point me to the latest Dockerfile being used to generate the hipblas images. Thank you

@jtwolfe
Copy link
Contributor

jtwolfe commented Apr 7, 2024

Im trying to work on the hipblas version but I am confused on where the Dockerfiles are located that are used to generate the latest images such as "quay.io/go-skynet/local-ai:master-hipblas" . One thing I noticed is that the latest hipblas images are still using rocm v6.0.0 while v6.0.3 is now out. But I have been unable to locate a Dockerfile in the git repo that is installing any version of rocm. So it would appear the Dockerfle being used is hosted elsewhere?

Would appreciate if someone could point me to the latest Dockerfile being used to generate the hipblas images. Thank you

newer does not equal better, this said, x.x.Y versions of Y variation are usually hotfixes and usually only apply to some very specific edge cases, can you clarify any issues you may have with 6.0.0 that are resolved with 6.0.3?

@jtwolfe
Copy link
Contributor

jtwolfe commented Apr 7, 2024

the rocm docker image does appear to load the model however there is a grpc error that I have encountered that causes the call to terminate before inference, i am moving to 22.04 with rocm 6.0.0 on the host make sure there are no version compatibility issues.

Note: the new vulkan implementation of llama.cpp seems to work flawlessly

I think I just discovered the cause of my issue...
I am running my Radeon VII for this workload
this would be a gfx906 device
presently i find only GPU_TARGETS ?= gfx900,gfx90a,gfx1030,gfx1031,gfx1100 in the makefile
regarding this gfx900 is not supported for rocm v5.>> or v6.0.0

I have yet to test if a tailored build including gfx906 will work but this may be a good candidate for inclusion in the next hipblas build details

for reference currently under 6.0.0 the following llbm targets are supported
gfx942,gfx90a,gfx908,gfx906,gfx1100,gfx1030
I would not for clarity that the gfx906 target is depreciated for the instinct MI50 but not for the radeon pro vii or the radeon vii, add to this that the instinct MI25 is the only gfx900 card and is noted as no longer supported, while I do think we should keep gfx900 in place for as long as possible it may impact future builds

I may not have time to test an amendment to the GPU_TARGETS for the next few weeks (I only have like 2 hrs free today and after building my gpu into a single node k8s cluster I need to configure a local container registry before I can test any custom builds :( )

@fenfir might you be able to test this?

@jtwolfe
Copy link
Contributor

jtwolfe commented Apr 7, 2024

the rocm docker image does appear to load the model however there is a grpc error that I have encountered that causes the call to terminate before inference, i am moving to 22.04 with rocm 6.0.0 on the host make sure there are no version compatibility issues.
Note: the new vulkan implementation of llama.cpp seems to work flawlessly

I think I just discovered the cause of my issue... I am running my Radeon VII for this workload this would be a gfx906 device presently i find only GPU_TARGETS ?= gfx900,gfx90a,gfx1030,gfx1031,gfx1100 in the makefile regarding this gfx900 is not supported for rocm v5.>> or v6.0.0

I have yet to test if a tailored build including gfx906 will work but this may be a good candidate for inclusion in the next hipblas build details

for reference currently under 6.0.0 the following llbm targets are supported gfx942,gfx90a,gfx908,gfx906,gfx1100,gfx1030 I would not for clarity that the gfx906 target is depreciated for the instinct MI50 but not for the radeon pro vii or the radeon vii, add to this that the instinct MI25 is the only gfx900 card and is noted as no longer supported, while I do think we should keep gfx900 in place for as long as possible it may impact future builds

I may not have time to test an amendment to the GPU_TARGETS for the next few weeks (I only have like 2 hrs free today and after building my gpu into a single node k8s cluster I need to configure a local container registry before I can test any custom builds :( )

@fenfir might you be able to test this?

ok so fyi
current master-hipblas-ffmpeg-core image with GPU_TARGETS=gfx906 does not build

[  0%] Building C object CMakeFiles/ggml.dir/ggml.c.o
[  1%] Building C object CMakeFiles/ggml.dir/ggml-alloc.c.o
[  1%] Building C object CMakeFiles/ggml.dir/ggml-backend.c.o
[  2%] Building C object CMakeFiles/ggml.dir/ggml-quants.c.o
[  2%] Building CXX object CMakeFiles/ggml.dir/ggml-cuda/acc.cu.o
clang++: error: invalid target ID 'gfx903'; format is a processor name followed by an optional colon-delimited list of features followed by an enable/disable sign (e.g., 'gfx908:sramecc+:xnack-')
gmake[4]: *** [CMakeFiles/ggml.dir/build.make:132: CMakeFiles/ggml.dir/ggml-cuda/acc.cu.o] Error 1
2024-04-07T15:31:29.842216496+10:00 gmake[4]: Leaving directory '/build/backend/cpp/llama/llama.cpp/build'
gmake[3]: *** [CMakeFiles/Makefile2:842: CMakeFiles/ggml.dir/all] Error 2
gmake[3]: Leaving directory '/build/backend/cpp/llama/llama.cpp/build'
2024-04-07T15:31:29.842808442+10:00 gmake[2]: *** [Makefile:146: all] Error 2
2024-04-07T15:31:29.842836792+10:00 gmake[2]: Leaving directory '/build/backend/cpp/llama/llama.cpp/build'
make[1]: *** [Makefile:75: grpc-server] Error 2
make[1]: Leaving directory '/build/backend/cpp/llama'
make: *** [Makefile:517: backend/cpp/llama/grpc-server] Error 2

EDIT: 'waaaaaaiiiiit a second' I think im retarded...
EDIT2: yep im definately retarded, setting the environment var GPU_TARGETS=gfx906 worked fine, not i just need to get my model and context right <3 @mudler @fenfir <3 can we pls get gfx906 added to the default targets pls

@jtwolfe
Copy link
Contributor

jtwolfe commented Apr 7, 2024

@Expro take a look at my previous posts, maybe they will help you solve this, ping me if you like, maybe I can help

@jtwolfe
Copy link
Contributor

jtwolfe commented Apr 7, 2024

@mudler before i spend the time, are there any immediate plans for expanded k8s docs or AMD specific docs?

@mudler
Copy link
Owner

mudler commented Apr 7, 2024

@mudler before i spend the time, are there any immediate plans for expanded k8s docs or AMD specific docs?

Hey @jtwolfe , thanks for deep diving into this, I don't have an AMD card to test things out so I refrained to write documentation that I couldn't test with. Any help on that area is greatly appreciated.

@jtwolfe
Copy link
Contributor

jtwolfe commented Apr 7, 2024

@mudler before i spend the time, are there any immediate plans for expanded k8s docs or AMD specific docs?

Hey @jtwolfe , thanks for deep diving into this, I don't have an AMD card to test things out so I refrained to write documentation that I couldn't test with. Any help on that area is greatly appreciated.

ack.
I'll do my best to try and get some of AMD brethren to test some more edge cases so we can give some more details on modern cards but I will send a PR up for docs when I get time.

@jtwolfe
Copy link
Contributor

jtwolfe commented Jun 24, 2024

@bunder2015 @mudler please note that neither rocm5 nor rocm6 officially support the 6k series chips. I'm not saying that they wont work, just make sure to consider this.

edit; This may be a good time to dig a bit further into Vulkan implementation, its definitely WAY more compatible and may be a good way of including arm chips with vulkan capable gpu cores. I'm specifically thinking about all of these new fancy arm laptops that everyone is producing now

image

@bunder2015
Copy link

😮‍💨 AMD's not making this easy... Thanks for the charts, I only suggested the 6 series because it was PCIE gen4 and not the latest (ie expensive) chips.

Looks like Radeon VII's are still out there for the same price bracket, even though it will be deprecated by rocm eventually.

@Airradda
Copy link

Airradda commented Jun 24, 2024

please note that neither rocm5 nor rocm6 officially support the 6k series chips.

Just for some farther clarification, this is only true for ROCm on Linux, they are supported for ROCm for Windows. Also out of personal experience with my 6950XT*, I have not had issues that I can pin on ROCm when trying to use anything that advertised ROCm support (Text-Gen-UI, Ollama, Llamafile, Comfy-UI, SD-Gen-UI, SD-Next, etc.), and even some that don't.

Edit:
* So long as make sure they include in or I edit them to include in the docker compose file:

devices:
    - /dev/dri
    - /dev/kfd

@jtwolfe
Copy link
Contributor

jtwolfe commented Jun 25, 2024

@Airradda TRUE! I forget that Windows exists sometimes :P The question would then be which approach is better? Use windows with older cards to host; or ensure uniform compatibility with newer cards on windows or linux platforms /shrug This should be continued as a background discussion, (eg for containers and k8s, if there was a windows-core based container then maybe this could work, they need a windows host tho and the k8s on windows problem is a whole other kettle of fish)

@bunder2015 R-VII are pretty solid options as long as you can get a good one (aka non-mining rig card, they were often configured at lower mem voltage and higher freqs and had a habit of hard-locking if not under load)

I don't begrudge AMD for their flippy-floppy nonsense and lack of support of older cards given how much has changed in their architecture recently, but it does seem like they kinda settled recently with whatever makes the gfx1100 llvm target work, I just hope RDNA3 is a little more extensible going forward so we get like a good 5y of support for current cards.

I expect that the reason they haven't been able to cap. more of the market share at their current price point is that CUDA has been so over developed that you could almost shake it up and have a fully functional OS fall out with all the spare code that's floating around in it. AMD have had to figure out what to do first, then how best to do it, and do it cheaper.... I don't envy them.

ps. wow @mudler that was quick XD

@jtwolfe
Copy link
Contributor

jtwolfe commented Jun 25, 2024

please note that neither rocm5 nor rocm6 officially support the 6k series chips.

Just for some farther clarification, this is only true for ROCm on Linux, they are supported for ROCm for Windows. Also out of personal experience with my 6950XT*, I have not had issues that I can pin on ROCm when trying to use anything that advertised ROCm support (Text-Gen-UI, Ollama, Llamafile, Comfy-UI, SD-Gen-UI, SD-Next, etc.), and even some that don't.

Edit: * So long as make sure they include in or I edit them to include in the docker compose file:

devices:
    - /dev/dri
    - /dev/kfd

If you have time it would be appreciated if you could add to the documentation on compatibility <3

@bunder2015
Copy link

I'll hold on for a couple days if anyone wants to help out. 👍

and sent. I hope that it's enough to get you something usable. Cheers

@mudler
Copy link
Owner

mudler commented Jul 9, 2024

I'll hold on for a couple days if anyone wants to help out. 👍

and sent. I hope that it's enough to get you something usable. Cheers

just saw it now - cool man, thank you! getting my hands on one of these, what do you suggest for the range? I don't have much experience in the AMD series but will have a look over this weekend

😮‍💨 AMD's not making this easy... Thanks for the charts, I only suggested the 6 series because it was PCIE gen4 and not the latest (ie expensive) chips.

Looks like Radeon VII's are still out there for the same price bracket, even though it will be deprecated by rocm eventually.

What are you running?

@bunder2015
Copy link

just saw it now - cool man, thank you!

No problem, glad I can help somehow.

What are you running?

I'm running localai on a Threadripper 2950x, 128gb of memory, and a Radeon VII... it's only capable of PCIE gen3, so the VII seemed the most I could use until I upgrade the whole system.

Cheers

@CullenShane
Copy link

CullenShane commented Jul 11, 2024

Howdy! So I'm trying to figure out building hipblas with support for whisper.cpp but I seem to be getting an error on running any audio transcriptions. I have a Radeon 7600xt, which is gfx1102. Ollama is working great, and I've been able to compile and run whisper.cpp and get it to offload to the card just great. I can get localai working with the card on llama3 and the default function model, and I can get piper working, but I think that's still on the CPU.

I'm running the quay.io/go-skynet/local-ai:v2.18.1-aio-gpu-hipblas container

With that setup I get the following error when I try to run the example gb1.ogg from the startup docs:

root@7c141b63533a:/build/models/localai/localai# curl http://localhost:8080/v1/audio/transcriptions -H "Content-Type: multipart/form-data" -F file="@$PWD/gb1.ogg" -F model="whisper-1"
{"error":{"code":500,"message":"rpc error: code = Unavailable desc = error reading from server: EOF","type":""}}

With REBUILD unset I got the following error:

�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr ggml_cuda_compute_forward: CONT failed
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr CUDA error: invalid device function
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr   current device: 0, in function ggml_cuda_compute_forward at ggml-cuda.cu:2304
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr   err
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr GGML_ASSERT: ggml-cuda.cu:60: !"CUDA error"
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr SIGABRT: abort
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr PC=0x7fe8917739fc m=4 sigcode=18446744073709551610

Same error with more context:

�[90m8:26AM�[0m DBG Request received: {"model":"whisper-1","language":"","translate":false,"n":0,"top_p":null,"top_k":null,"temperature":null,"max_tokens":null,"echo":false,"batch":0,"ignore_eos":false,"repeat_penalty":0,"repeat_last_n":0,"n_keep":0,"frequency_penalty":0,"presence_penalty":0,"tfz":null,"typical_p":null,"seed":null,"negative_prompt":"","rope_freq_base":0,"rope_freq_scale":0,"negative_prompt_scale":0,"use_fast_tokenizer":false,"clip_skip":0,"tokenizer":"","file":"","size":"","prompt":null,"instruction":"","input":null,"stop":null,"messages":null,"functions":null,"function_call":null,"stream":false,"mode":0,"step":0,"grammar":"","grammar_json_functions":null,"grammar_json_name":null,"backend":"","model_base_name":""}
�[90m8:26AM�[0m DBG guessDefaultsFromFile: not a GGUF file
�[90m8:26AM�[0m DBG Audio file copied to: /tmp/whisper1159899143/blob
�[90m8:26AM�[0m �[32mINF�[0m �[1mLoading model 'ggml-whisper-base.bin' with backend whisper�[0m
�[90m8:26AM�[0m DBG Loading model in memory from file: /build/models/localai/localai/ggml-whisper-base.bin
�[90m8:26AM�[0m DBG Loading Model ggml-whisper-base.bin with gRPC (file: /build/models/localai/localai/ggml-whisper-base.bin) (backend: whisper): {backendString:whisper model:ggml-whisper-base.bin threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0012d6488 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh mamba:/build/backend/python/mamba/run.sh openvoice:/build/backend/python/openvoice/run.sh parler-tts:/build/backend/python/parler-tts/run.sh petals:/build/backend/python/petals/run.sh rerankers:/build/backend/python/rerankers/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
�[90m8:26AM�[0m DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/whisper
�[90m8:26AM�[0m DBG GRPC Service for ggml-whisper-base.bin will be running at: '127.0.0.1:41405'
�[90m8:26AM�[0m DBG GRPC Service state dir: /tmp/go-processmanager1593687394
�[90m8:26AM�[0m DBG GRPC Service Started
�[90m8:26AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr 2024/07/06 08:26:59 gRPC Server listening at 127.0.0.1:41405
�[90m8:27AM�[0m DBG GRPC Service Ready
�[90m8:27AM�[0m DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:ggml-whisper-base.bin ContextSize:0 Seed:0 NBatch:0 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:0 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/localai/localai/ggml-whisper-base.bin Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false}
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr whisper_init_from_file_with_params_no_state: loading model from '/build/models/localai/localai/ggml-whisper-base.bin'
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr whisper_init_with_params_no_state: use gpu    = 1
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr whisper_init_with_params_no_state: flash attn = 0
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr whisper_init_with_params_no_state: gpu_device = 0
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr whisper_init_with_params_no_state: dtw        = 0
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr whisper_model_load: loading model
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr whisper_model_load: n_vocab       = 51865
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr whisper_model_load: n_audio_ctx   = 1500
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr whisper_model_load: n_audio_state = 512
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr whisper_model_load: n_audio_head  = 8
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr whisper_model_load: n_audio_layer = 6
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr whisper_model_load: n_text_ctx    = 448
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr whisper_model_load: n_text_state  = 512
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr whisper_model_load: n_text_head   = 8
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr whisper_model_load: n_text_layer  = 6
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr whisper_model_load: n_mels        = 80
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr whisper_model_load: ftype         = 1
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr whisper_model_load: qntvr         = 0
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr whisper_model_load: type          = 2 (base)
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr whisper_model_load: adding 1608 extra tokens
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr whisper_model_load: n_langs       = 99
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr whisper_backend_init: using CUDA backend
�[90m8:27AM�[0m �[32mINF�[0m �[1mSuccess�[0m �[36mip=�[0m192.168.1.24 �[36mlatency=�[0m"13.456µs" �[36mmethod=�[0mGET �[36mstatus=�[0m200 �[36murl=�[0m/readyz
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr ggml_cuda_init: GGML_CUDA_FORCE_MMQ:   no
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr ggml_cuda_init: found 1 ROCm devices:
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr   Device 0: AMD Radeon™ RX 7600 XT, compute capability 11.0, VMM: no
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr whisper_model_load:    ROCm0 total size =   147.37 MB
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr whisper_model_load: model size    =  147.37 MB
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr whisper_backend_init: using CUDA backend
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr whisper_init_state: kv self size  =   18.87 MB
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr whisper_init_state: kv cross size =   18.87 MB
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr whisper_init_state: kv pad  size  =    3.15 MB
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr whisper_init_state: compute buffer (conv)   =   16.39 MB
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr whisper_init_state: compute buffer (encode) =  132.07 MB
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr whisper_init_state: compute buffer (cross)  =    4.78 MB
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr whisper_init_state: compute buffer (decode) =   96.48 MB
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr whisper_mel_init: n_len = 3532, n_len_org = 532, n_mel = 80
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr ggml_cuda_compute_forward: CONT failed
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr CUDA error: invalid device function
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr   current device: 0, in function ggml_cuda_compute_forward at ggml-cuda.cu:2304
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr   err
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr GGML_ASSERT: ggml-cuda.cu:60: !"CUDA error"
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr SIGABRT: abort
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr PC=0x7fe8917739fc m=4 sigcode=18446744073709551610
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr signal arrived during cgo execution
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr 
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr goroutine 37 gp=0xc000103dc0 m=4 mp=0xc000067808 [syscall]:
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr runtime.cgocall(0x10379f0, 0xc000146e18)
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr 	/usr/local/go/src/runtime/cgocall.go:157 +0x4b fp=0xc000146df0 sp=0xc000146db8 pc=0xc08d6b
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr github.com/ggerganov/whisper.cpp/bindings/go._Cfunc_whisper_full(0x7fe790002f50, {0x0, 0x4, 0x4000, 0x0, 0x0, 0x0, 0x1, 0x0, 0x0, ...}, ...)
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr 	_cgo_gotypes.go:321 +0x4e fp=0xc000146e18 sp=0xc000146df0 pc=0x1025b6e
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr github.com/ggerganov/whisper.cpp/bindings/go.(*Context).Whisper_full.func1(0x7fe790002f50, 0xc0001261e0?, {0xc00045c000, 0x7fe79cff9878?, 0x10?})
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr 	/build/sources/whisper.cpp/bindings/go/whisper.go:317 +0x157 fp=0xc000147170 sp=0xc000146e18 pc=0x102a577
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr github.com/ggerganov/whisper.cpp/bindings/go.(*Context).Whisper_full(_, {0x0, 0x4, 0x4000, 0x0, 0x0, 0x0, 0x1, 0x0, 0x0, ...}, ...)
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr 	/build/sources/whisper.cpp/bindings/go/whisper.go:317 +0x267 fp=0xc000147208 sp=0xc000147170 pc=0x102a2c7
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr github.com/ggerganov/whisper.cpp/bindings/go/pkg/whisper.(*context).Process(0xc000180000, {0xc00045c000, 0x14d00, 0x14d00}, 0x0, 0x0)
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr 	/build/sources/whisper.cpp/bindings/go/pkg/whisper/context.go:191 +0x1ac fp=0xc000147468 sp=0xc000147208 pc=0x102ed8c
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr main.Transcript({0x3c7c18, 0xc00012eea0}, {0xc00015e4e0, 0x1b}, {0x0, 0x0}, 0x0, 0x4)
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr 	/build/backend/go/transcribe/transcript.go:82 +0x5d9 fp=0xc000147738 sp=0xc000147468 pc=0x10363d9
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr main.(*Whisper).AudioTranscription(0x1f?, 0x1f?)
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr 	/build/backend/go/transcribe/whisper.go:25 +0x58 fp=0xc0001477d8 sp=0xc000147738 pc=0x1036b98
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr github.com/mudler/LocalAI/pkg/grpc.(*server).AudioTranscription(0xc000112f10, {0x2e4300?, 0xc000071990?}, 0xc000222e10)
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr 	/build/pkg/grpc/server.go:95 +0xf2 fp=0xc000147950 sp=0xc0001477d8 pc=0x1023792
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr github.com/mudler/LocalAI/pkg/grpc/proto._Backend_AudioTranscription_Handler({0x2e4300, 0xc000112f10}, {0x3c7c50, 0xc00022a2d0}, 0xc000226780, 0x0)
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr 	/build/pkg/grpc/proto/backend_grpc.pb.go:442 +0x1a6 fp=0xc0001479a0 sp=0xc000147950 pc=0x101f926
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr google.golang.org/grpc.(*Server).processUnaryRPC(0xc000192400, {0x3c7c50, 0xc00022a240}, {0x3cb220, 0xc000394000}, 0xc0003b8000, 0xc0001d8e10, 0x11aa058, 0x0)
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr 	/root/go/pkg/mod/google.golang.org/[email protected]/server.go:1379 +0xdf8 fp=0xc000147da0 sp=0xc0001479a0 pc=0x10053d8
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr google.golang.org/grpc.(*Server).handleStream(0xc000192400, {0x3cb220, 0xc000394000}, 0xc0003b8000)
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr 	/root/go/pkg/mod/google.golang.org/[email protected]/server.go:1790 +0xe8b fp=0xc000147f78 sp=0xc000147da0 pc=0x100a2ab
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr google.golang.org/grpc.(*Server).serveStreams.func2.1()
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr 	/root/go/pkg/mod/google.golang.org/[email protected]/server.go:1029 +0x8b fp=0xc000147fe0 sp=0xc000147f78 pc=0x100348b
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr runtime.goexit({})
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr 	/usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000147fe8 sp=0xc000147fe0 pc=0xc71241
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr created by google.golang.org/grpc.(*Server).serveStreams.func2 in goroutine 36
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr 	/root/go/pkg/mod/google.golang.org/[email protected]/server.go:1040 +0x125
�[90m8:27AM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:41405): stderr 

And so I tried a bunch of things, and I keep getting this identical error. I tried a bunch of different flags, I tried setting external environment variables (which I didn't keep good records of) I tried replacing rocm with my local copy, I tried bumping the version of whisper to the latest one I can build separately. The next thing I haven't tried is placing the libraries from the working whisper.cpp build into the localai container.

I just am always getting this error, and I seem to have no ability to affect it if my build is successful.

right now my environment is:

DEBUG=true
BUILD_PARALLELISM=12
BUILD_TYPE=hipblas
ROCM_HOME=/opt/rocm-6.1.2
ROCM_PATH=/opt/rocm-6.1.2
REBUILD=true
GO_TAGS=tts
GPU_TARGETS=gfx1102
HSA_OVERRIDE_GFX_VERSION=11.0.0
BUILD_SHARED_LIBS=ON
WHISPER_CPP_VERSION=1c31f9d4a8936aec550e6c4dc9ca5cae3b4f304a
MODELS_PATH=/build/models/localai/localai/

And with all those changes I still get the same exact error message.

�[90m6:54PM�[0m DBG Request received: {"model":"whisper-1","language":"","translate":false,"n":0,"top_p":null,"top_k":null,"temperature":null,"max_tokens":null,"echo":false,"batch":0,"ignore_eos":false,"repeat_penalty":0,"repeat_last_n":0,"n_keep":0,"frequency_penalty":0,"presence_penalty":0,"tfz":null,"typical_p":null,"seed":null,"negative_prompt":"","rope_freq_base":0,"rope_freq_scale":0,"negative_prompt_scale":0,"use_fast_tokenizer":false,"clip_skip":0,"tokenizer":"","file":"","size":"","prompt":null,"instruction":"","input":null,"stop":null,"messages":null,"functions":null,"function_call":null,"stream":false,"mode":0,"step":0,"grammar":"","grammar_json_functions":null,"grammar_json_name":null,"backend":"","model_base_name":""}
�[90m6:54PM�[0m DBG guessDefaultsFromFile: not a GGUF file
�[90m6:54PM�[0m DBG Audio file copied to: /tmp/whisper4244567951/gb1.ogg
�[90m6:54PM�[0m �[32mINF�[0m �[1mLoading model 'ggml-whisper-base.bin' with backend whisper�[0m
�[90m6:54PM�[0m DBG Loading model in memory from file: /build/models/localai/localai/ggml-whisper-base.bin
�[90m6:54PM�[0m DBG Loading Model ggml-whisper-base.bin with gRPC (file: /build/models/localai/localai/ggml-whisper-base.bin) (backend: whisper): {backendString:whisper model:ggml-whisper-base.bin threads:6 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0001d5688 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh mamba:/build/backend/python/mamba/run.sh openvoice:/build/backend/python/openvoice/run.sh parler-tts:/build/backend/python/parler-tts/run.sh petals:/build/backend/python/petals/run.sh rerankers:/build/backend/python/rerankers/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
�[90m6:54PM�[0m DBG Loading GRPC Process: /tmp/localai/backend_data/backend-assets/grpc/whisper
�[90m6:54PM�[0m DBG GRPC Service for ggml-whisper-base.bin will be running at: '127.0.0.1:36797'
�[90m6:54PM�[0m DBG GRPC Service state dir: /tmp/go-processmanager3884338005
�[90m6:54PM�[0m DBG GRPC Service Started
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr 2024/07/11 18:54:51 gRPC Server listening at 127.0.0.1:36797
�[90m6:54PM�[0m �[32mINF�[0m �[1mSuccess�[0m �[36mip=�[0m192.168.1.22 �[36mlatency=�[0m"50.336µs" �[36mmethod=�[0mGET �[36mstatus=�[0m200 �[36murl=�[0m/browse/job/progress/8c62ec6f-3b6e-11ef-a2b1-0242ac110008
�[90m6:54PM�[0m DBG GRPC Service Ready
�[90m6:54PM�[0m DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:ggml-whisper-base.bin ContextSize:0 Seed:0 NBatch:0 F16Memory:false MLock:false MMap:false VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:0 MainGPU: TensorSplit: Threads:0 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/build/models/localai/localai/ggml-whisper-base.bin Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false}
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr whisper_init_from_file_with_params_no_state: loading model from '/build/models/localai/localai/ggml-whisper-base.bin'
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr whisper_init_with_params_no_state: use gpu    = 1
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr whisper_init_with_params_no_state: flash attn = 0
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr whisper_init_with_params_no_state: gpu_device = 0
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr whisper_init_with_params_no_state: dtw        = 0
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr whisper_model_load: loading model
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr whisper_model_load: n_vocab       = 51865
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr whisper_model_load: n_audio_ctx   = 1500
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr whisper_model_load: n_audio_state = 512
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr whisper_model_load: n_audio_head  = 8
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr whisper_model_load: n_audio_layer = 6
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr whisper_model_load: n_text_ctx    = 448
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr whisper_model_load: n_text_state  = 512
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr whisper_model_load: n_text_head   = 8
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr whisper_model_load: n_text_layer  = 6
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr whisper_model_load: n_mels        = 80
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr whisper_model_load: ftype         = 1
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr whisper_model_load: qntvr         = 0
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr whisper_model_load: type          = 2 (base)
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr whisper_model_load: adding 1608 extra tokens
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr whisper_model_load: n_langs       = 99
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr whisper_backend_init: using CUDA backend
�[90m6:54PM�[0m �[32mINF�[0m �[1mSuccess�[0m �[36mip=�[0m192.168.1.22 �[36mlatency=�[0m"37.952µs" �[36mmethod=�[0mGET �[36mstatus=�[0m200 �[36murl=�[0m/browse/job/progress/8c62ec6f-3b6e-11ef-a2b1-0242ac110008
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr ggml_cuda_init: GGML_CUDA_FORCE_MMQ:   no
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr ggml_cuda_init: found 1 ROCm devices:
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr   Device 0: AMD Radeon™ RX 7600 XT, compute capability 11.0, VMM: no
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr whisper_model_load:    ROCm0 total size =   147.37 MB
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr whisper_model_load: model size    =  147.37 MB
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr whisper_backend_init: using CUDA backend
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr whisper_init_state: kv self size  =   18.87 MB
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr whisper_init_state: kv cross size =   18.87 MB
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr whisper_init_state: kv pad  size  =    3.15 MB
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr whisper_init_state: compute buffer (conv)   =   16.39 MB
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr whisper_init_state: compute buffer (encode) =  132.07 MB
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr whisper_init_state: compute buffer (cross)  =    4.78 MB
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr whisper_init_state: compute buffer (decode) =   96.48 MB
�[90m6:54PM�[0m �[32mINF�[0m �[1mSuccess�[0m �[36mip=�[0m192.168.1.22 �[36mlatency=�[0m"29.677µs" �[36mmethod=�[0mGET �[36mstatus=�[0m200 �[36murl=�[0m/browse/job/progress/8c62ec6f-3b6e-11ef-a2b1-0242ac110008
�[90m6:54PM�[0m �[32mINF�[0m �[1mSuccess�[0m �[36mip=�[0m192.168.1.22 �[36mlatency=�[0m"32.673µs" �[36mmethod=�[0mGET �[36mstatus=�[0m200 �[36murl=�[0m/browse/job/progress/8c62ec6f-3b6e-11ef-a2b1-0242ac110008
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr whisper_mel_init: n_len = 22874, n_len_org = 19874, n_mel = 80
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr ggml_cuda_compute_forward: CONT failed
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr CUDA error: invalid device function
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr   current device: 0, in function ggml_cuda_compute_forward at ggml-cuda.cu:2304
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr   err
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr GGML_ASSERT: ggml-cuda.cu:60: !"CUDA error"
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr SIGABRT: abort
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr PC=0x7fb6807029fc m=8 sigcode=18446744073709551610
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr signal arrived during cgo execution
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr 
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr goroutine 38 gp=0xc0002ba380 m=8 mp=0xc000334808 [syscall]:
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr runtime.cgocall(0x10379f0, 0xc00013ce18)
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr 	/usr/local/go/src/runtime/cgocall.go:157 +0x4b fp=0xc00013cdf0 sp=0xc00013cdb8 pc=0xc08d6b
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr github.com/ggerganov/whisper.cpp/bindings/go._Cfunc_whisper_full(0x2d43df0, {0x0, 0x6, 0x4000, 0x0, 0x0, 0x0, 0x1, 0x0, 0x0, ...}, ...)
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr 	_cgo_gotypes.go:321 +0x4e fp=0xc00013ce18 sp=0xc00013cdf0 pc=0x1025b6e
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr github.com/ggerganov/whisper.cpp/bindings/go.(*Context).Whisper_full.func1(0x2d43df0, 0xc00011c1e0?, {0xc00058e000, 0x7fb58c7769a8?, 0x10?})
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr 	/build/sources/whisper.cpp/bindings/go/whisper.go:317 +0x157 fp=0xc00013d170 sp=0xc00013ce18 pc=0x102a577
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr github.com/ggerganov/whisper.cpp/bindings/go.(*Context).Whisper_full(_, {0x0, 0x6, 0x4000, 0x0, 0x0, 0x0, 0x1, 0x0, 0x0, ...}, ...)
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr 	/build/sources/whisper.cpp/bindings/go/whisper.go:317 +0x267 fp=0xc00013d208 sp=0xc00013d170 pc=0x102a2c7
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr github.com/ggerganov/whisper.cpp/bindings/go/pkg/whisper.(*context).Process(0xc0000dc120, {0xc00058e000, 0x308597, 0x308597}, 0x0, 0x0)
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr 	/build/sources/whisper.cpp/bindings/go/pkg/whisper/context.go:191 +0x1ac fp=0xc00013d468 sp=0xc00013d208 pc=0x102ed8c
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr main.Transcript({0x3c7c18, 0xc000012618}, {0xc0002c4040, 0x1e}, {0x0, 0x0}, 0x0, 0x6)
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr 	/build/backend/go/transcribe/transcript.go:82 +0x5d9 fp=0xc00013d738 sp=0xc00013d468 pc=0x10363d9
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr main.(*Whisper).AudioTranscription(0x22?, 0x22?)
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr 	/build/backend/go/transcribe/whisper.go:25 +0x58 fp=0xc00013d7d8 sp=0xc00013d738 pc=0x1036b98
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr github.com/mudler/LocalAI/pkg/grpc.(*server).AudioTranscription(0xc000118ef0, {0x2e4300?, 0xc00032a990?}, 0xc00028a3c0)
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr 	/build/pkg/grpc/server.go:95 +0xf2 fp=0xc00013d950 sp=0xc00013d7d8 pc=0x1023792
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr github.com/mudler/LocalAI/pkg/grpc/proto._Backend_AudioTranscription_Handler({0x2e4300, 0xc000118ef0}, {0x3c7c50, 0xc000282630}, 0xc0002e2000, 0x0)
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr 	/build/pkg/grpc/proto/backend_grpc.pb.go:442 +0x1a6 fp=0xc00013d9a0 sp=0xc00013d950 pc=0x101f926
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr google.golang.org/grpc.(*Server).processUnaryRPC(0xc000194400, {0x3c7c50, 0xc0002825a0}, {0x3cb220, 0xc000002180}, 0xc0002b2120, 0xc0001dae10, 0x11aa058, 0x0)
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr 	/root/go/pkg/mod/google.golang.org/[email protected]/server.go:1379 +0xdf8 fp=0xc00013dda0 sp=0xc00013d9a0 pc=0x10053d8
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr google.golang.org/grpc.(*Server).handleStream(0xc000194400, {0x3cb220, 0xc000002180}, 0xc0002b2120)
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr 	/root/go/pkg/mod/google.golang.org/[email protected]/server.go:1790 +0xe8b fp=0xc00013df78 sp=0xc00013dda0 pc=0x100a2ab
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr google.golang.org/grpc.(*Server).serveStreams.func2.1()
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr 	/root/go/pkg/mod/google.golang.org/[email protected]/server.go:1029 +0x8b fp=0xc00013dfe0 sp=0xc00013df78 pc=0x100348b
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr runtime.goexit({})
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr 	/usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc00013dfe8 sp=0xc00013dfe0 pc=0xc71241
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr created by google.golang.org/grpc.(*Server).serveStreams.func2 in goroutine 7
�[90m6:54PM�[0m DBG GRPC(ggml-whisper-base.bin-127.0.0.1:36797): stderr 	/root/go/pkg/mod/google.golang.org/[email protected]/server.go:1040 +0x125

How can I help?

@gymnae
Copy link

gymnae commented Jul 30, 2024

Facing the same issue with 19.2. and an AMD Ryzen Pro 8700GE with it's GFX1103 Radeon 870M:

GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:39417): stderr Traceback (most recent call last):
2024-07-30T12:24:13.958419872Z 12:24PM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:39417): stderr   File "/build/backend/python/diffusers/backend.py", line 20, in <module>
2024-07-30T12:24:13.958488183Z 12:24PM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:39417): stderr     from diffusers import StableDiffusion3Pipeline, StableDiffusionXLPipeline, StableDiffusionDepth2ImgPipeline, DPMSolverMultistepScheduler, StableDiffusionPipeline, DiffusionPipeline, \
2024-07-30T12:24:13.958496523Z 12:24PM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:39417): stderr   File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
2024-07-30T12:24:13.958501803Z 12:24PM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:39417): stderr   File "/build/backend/python/diffusers/venv/lib/python3.10/site-packages/diffusers/utils/import_utils.py", line 799, in __getattr__
2024-07-30T12:24:13.958512083Z 12:24PM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:39417): stderr     value = getattr(module, name)
2024-07-30T12:24:13.958515923Z 12:24PM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:39417): stderr   File "/build/backend/python/diffusers/venv/lib/python3.10/site-packages/diffusers/utils/import_utils.py", line 799, in __getattr__
2024-07-30T12:24:13.958623654Z 12:24PM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:39417): stderr     value = getattr(module, name)
2024-07-30T12:24:13.958632294Z 12:24PM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:39417): stderr   File "/build/backend/python/diffusers/venv/lib/python3.10/site-packages/diffusers/utils/import_utils.py", line 798, in __getattr__
2024-07-30T12:24:13.958694395Z 12:24PM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:39417): stderr     module = self._get_module(self._class_to_module[name])
2024-07-30T12:24:13.958719585Z 12:24PM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:39417): stderr   File "/build/backend/python/diffusers/venv/lib/python3.10/site-packages/diffusers/utils/import_utils.py", line 810, in _get_module
2024-07-30T12:24:13.958851686Z 12:24PM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:39417): stderr     raise RuntimeError(
2024-07-30T12:24:13.958863296Z 12:24PM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:39417): stderr RuntimeError: Failed to import diffusers.pipelines.stable_diffusion_3.pipeline_stable_diffusion_3 because of the following error (look up to see its traceback):
2024-07-30T12:24:13.958868136Z 12:24PM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:39417): stderr Failed to import diffusers.loaders.single_file because of the following error (look up to see its traceback):
2024-07-30T12:24:13.958871726Z 12:24PM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:39417): stderr Failed to import transformers.models.auto.image_processing_auto because of the following error (look up to see its traceback):
2024-07-30T12:24:13.958875196Z 12:24PM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:39417): stderr operator torchvision::nms does not exist

Docker compose:

# docker-compose.yaml
#version: "3.9"
services:
  api:
    image: localai/localai:latest-aio-gpu-hipblas
    privileged: true
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
      interval: 1m
      timeout: 20m
      retries: 5
    ports:
      - 8080:8080
    environment:
      - DEBUG=true
    # If your gpu is not already included in the current list of default targets the following build details are required.
      - REBUILD=false
     # - PROFILE=cpu
      - THREADS=7
     # - BUILD_PARALLELISM=7
      - BUILD_TYPE=hipblas
      - GPU_TARGETS=gfx1100 # Example for Radeon VII
      - HSA_OVERRIDE_GFX_VERSION=11.0.0
    devices:
      # AMD GPU only require the following devices be passed through to the container for offloading to occur.
      - /dev/dri
      - /dev/kfd
    volumes:
      - ./models:/build/models:cached

rom-smi on the host:

========================================= ROCm System Management Interface =========================================
=================================================== Concise Info ===================================================
Device  [Model : Revision]    Temp    Power     Partitions      SCLK  MCLK     Fan  Perf  PwrCap       VRAM%  GPU%  
        Name (20 chars)       (Edge)  (Socket)  (Mem, Compute)                                                      
====================================================================================================================
0       [0x15bf : 0xd2]       44.0°C  42.02W    N/A, N/A        None  2400Mhz  0%   auto  Unsupported    4%   0%    
        Phoenix1                                                                                                    
====================================================================================================================
=============================================== End of ROCm SMI Log ================================================

mudler added a commit that referenced this issue Aug 7, 2024
Some of the dependencies in `requirements.txt`, even if generic, pulls
down the line CUDA libraries.

This changes moves mostly all GPU-specific libs to the build-type, and
tries a safer approach. In `requirements.txt` now are listed only
"first-level" dependencies, for instance, grpc, but libs-dependencies
are moved down to the respective build-type `requirements.txt` to avoid
any mixin.

This should fix #2737 and #1592.

Signed-off-by: Ettore Di Giacinto <[email protected]>
mudler added a commit that referenced this issue Aug 7, 2024
Some of the dependencies in `requirements.txt`, even if generic, pulls
down the line CUDA libraries.

This changes moves mostly all GPU-specific libs to the build-type, and
tries a safer approach. In `requirements.txt` now are listed only
"first-level" dependencies, for instance, grpc, but libs-dependencies
are moved down to the respective build-type `requirements.txt` to avoid
any mixin.

This should fix #2737 and #1592.

Signed-off-by: Ettore Di Giacinto <[email protected]>
mudler added a commit that referenced this issue Aug 7, 2024
Some of the dependencies in `requirements.txt`, even if generic, pulls
down the line CUDA libraries.

This changes moves mostly all GPU-specific libs to the build-type, and
tries a safer approach. In `requirements.txt` now are listed only
"first-level" dependencies, for instance, grpc, but libs-dependencies
are moved down to the respective build-type `requirements.txt` to avoid
any mixin.

This should fix #2737 and #1592.

Signed-off-by: Ettore Di Giacinto <[email protected]>
@bunder2015
Copy link

bunder2015 commented Aug 8, 2024

Morning @mudler, I think I finally figured out building images, I was able to build sha-f7ffa9c-hipblas-ffmpeg from scratch without quay... we seem to be on the right track with the recent fixes, but there seems to be one more issue with diffusers and torchvision... 😅
diffusers.txt

@mudler
Copy link
Owner

mudler commented Aug 8, 2024

Morning @mudler, I think I finally figured out building images, I was able to build sha-f7ffa9c-hipblas-ffmpeg from scratch without quay... we seem to be on the right track with the recent fixes, but there seems to be one more issue with diffusers and torchvision... 😅 diffusers.txt

that's good feedback, thanks for testing it! Attempted a fix for it in #3202 as we used to pin to nightly before the switch to uv.

@bunder2015
Copy link

Thanks, I gave that a shot, but it's throwing an error with --pre...

diffusers.txt

@bunder2015
Copy link

Hi, I tried again without the --pre but ran into an error saying it needs --prerelease=allow... So I tried that and I'm back to the same error I got with --pre... I'm at a loss. 🤷

diffusers.txt

@mudler
Copy link
Owner

mudler commented Aug 9, 2024

Hi, I tried again without the --pre but ran into an error saying it needs --prerelease=allow... So I tried that and I'm back to the same error I got with --pre... I'm at a loss. 🤷

diffusers.txt

that should be added to the uv args - 5fcafc3 that should cover it, thanks for testing!

@bunder2015
Copy link

Looks like we're still having issues building...
diffusers.txt

Cheers

mudler added a commit that referenced this issue Aug 10, 2024
@mudler
Copy link
Owner

mudler commented Aug 10, 2024

ok - seems uv doesn't quite like pre-releases and such. Going to try by pinning deps manually for now. I've tested locally and this now resolves deps correctly, however didn't tried inferencing on it yet.

about:

localai-api-1  | 1:36PM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:45403): stderr Traceback (most recent call last):
localai-api-1  | 1:36PM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:45403): stderr   File "/build/backend/python/diffusers/venv/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1603, in _get_module
localai-api-1  | 1:36PM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:45403): stderr     return importlib.import_module("." + module_name, self.__name__)
localai-api-1  | 1:36PM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:45403): stderr   File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
localai-api-1  | 1:36PM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:45403): stderr     return _bootstrap._gcd_import(name[level:], package, level)
localai-api-1  | 1:36PM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:45403): stderr   File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
localai-api-1  | 1:36PM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:45403): stderr   File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
localai-api-1  | 1:36PM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:45403): stderr   File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
localai-api-1  | 1:36PM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:45403): stderr   File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
localai-api-1  | 1:36PM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:45403): stderr   File "<frozen importlib._bootstrap_external>", line 883, in exec_module
localai-api-1  | 1:36PM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:45403): stderr   File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
localai-api-1  | 1:36PM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:45403): stderr   File "/build/backend/python/diffusers/venv/lib/python3.10/site-packages/transformers/models/auto/image_processing_auto.py", line 27, in <module>
localai-api-1  | 1:36PM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:45403): stderr     from ...image_processing_utils import BaseImageProcessor, ImageProcessingMixin
localai-api-1  | 1:36PM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:45403): stderr   File "/build/backend/python/diffusers/venv/lib/python3.10/site-packages/transformers/image_processing_utils.py", line 21, in <module>
localai-api-1  | 1:36PM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:45403): stderr     from .image_transforms import center_crop, normalize, rescale
localai-api-1  | 1:36PM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:45403): stderr   File "/build/backend/python/diffusers/venv/lib/python3.10/site-packages/transformers/image_transforms.py", line 22, in <module>
localai-api-1  | 1:36PM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:45403): stderr     from .image_utils import (
localai-api-1  | 1:36PM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:45403): stderr   File "/build/backend/python/diffusers/venv/lib/python3.10/site-packages/transformers/image_utils.py", line 58, in <module>
localai-api-1  | 1:36PM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:45403): stderr     from torchvision.transforms import InterpolationMode
localai-api-1  | 1:36PM DBG GRPC(DreamShaper_8_pruned.safetensors-127.0.0.1:45403): stderr ImportError: cannot import name 'InterpolationMode' from 'torchvision.transforms' (/build/backend/python/diffusers/venv/lib/python3.10/site-packages/torchvision/transforms/__init__.py)

It seems it was caused because uv without specifying package versions manually was picking up the following combination:

 + torch==2.4.0+rocm6.1                       
 + torchvision==0.2.0         

which obviously is not working as torchvision is too old (and not from rocm6x).

I've pinned now the packages to specific versions in here: 0c0bc18 and that now pulls things off correctly.

@bunder2015
Copy link

Thanks for the work on this, hopefully we got it this time... 😅

I'll pull again and see how far things go... I'll try to report back in ~4 hours...

Cheers

@bunder2015
Copy link

Success! Here's some bananas, in a wooden bowl, on a glass table. 😄
(generated by dreamshaper, it appears that sd3-medium won't fit in 16gb vram)
b643261420880

@mudler
Copy link
Owner

mudler commented Aug 10, 2024

Success! Here's some bananas, in a wooden bowl, on a glass table. 😄 (generated by dreamshaper, it appears that sd3-medium won't fit in 16gb vram) b643261420880

fantastic! thanks for the feedback! closing this issue now :)

@mudler mudler closed this as completed Aug 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/container enhancement New feature or request high prio roadmap up for grabs Tickets that no-one is currently working on
Projects
None yet
Development

No branches or pull requests