Preload model "text-embedding-ada-002" does not work in local-ai:latest-aio-gpu-nvidia-cuda-12 #2621

mijq · 2024-06-21T10:36:24Z

mijq
Jun 21, 2024

Start local-ai using following command:

docker run --rm -ti --gpus all -p 28080:8080 -e DEBUG=true -e MODELS_PATH=/models -e THREADS=4 -v ~/localai/models:/models quay.io/go-skynet/local-ai:latest-aio-gpu-nvidia-cuda-12

Try to test embedding model with curl like this:

$ curl http://localhost:28080/embeddings -X POST -H "Content-Type:application/json" -d '{ "input": "Your text string goes here", "model": "text-embedding-ada-002" }'

But got these error:

{"error":{"code":500,"message":"could not load model (no success): Unexpected err=OSError(\"We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like sentence-transformers/all-MiniLM-L6-v2 is not the path to a directory containing a file named config.json.\\nCheckout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.\"), type(err)=\u003cclass 'OSError'\u003e","type":""}}

Debug info as following:

10:00AM DBG Request received: {"model":"text-embedding-ada-002","language":"","n":0,"top_p":null,"top_k":null,"temperature":null,"max_tokens":null,"echo":false,"batch":0,"ignore_eos":false,"repeat_penalty":0,"n_keep":0,"frequency_penalty":0,"presence_penalty":0,"tfz":null,"typical_p":null,"seed":null,"negative_prompt":"","rope_freq_base":0,"rope_freq_scale":0,"negative_prompt_scale":0,"use_fast_tokenizer":false,"clip_skip":0,"tokenizer":"","file":"","response_format":{},"size":"","prompt":null,"instruction":"","input":"Your text string goes here","stop":null,"messages":null,"functions":null,"function_call":null,"stream":false,"mode":0,"step":0,"grammar":"","grammar_json_functions":null,"grammar_json_name":null,"backend":"","model_base_name":""}
10:00AM DBG Parameter Config: &{PredictionOptions:{Model:all-MiniLM-L6-v2 Language: N:0 TopP:0xc00048e9c0 TopK:0xc00048e9c8 Temperature:0xc00048e9d0 Maxtokens:0xc00048ea00 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc00048e9f8 TypicalP:0xc00048e9f0 Seed:0xc00048ea18 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:text-embedding-ada-002 F16:0xc00048e9b8 Threads:0xc00048e9b0 Debug:0xc0005f3c70 Roles:map[] Embeddings:false Backend:sentencetransformers TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions: UseTokenizerTemplate:false JoinChatMessagesByCharacter:<nil>} PromptStrings:[] InputStrings:[Your text string goes here] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false GrammarConfig:{ParallelCalls:false DisableParallelNewLines:false MixedMode:false NoMixedFreeString:false NoGrammar:false Prefix:} NoActionFunctionName: NoActionDescriptionName: ResponseRegex: JSONRegexMatch:[] ReplaceFunctionResults:[] ReplaceLLMResult:[] FunctionName:false} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc00048e9e8 MirostatTAU:0xc00048e9e0 Mirostat:0xc00048e9d8 NGPULayers:0xc00048ea08 MMap:0xc00048ea10 MMlock:0xc00048ea11 LowVRAM:0xc00048ea11 Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc00048e9a8 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 MMProj: FlashAttention:false NoKVOffloading:false RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false DownloadFiles:[] Description: Usage:You can test this model with curl like this:

curl http://localhost:8080/embeddings -X POST -H "Content-Type: application/json" -d '{
  "input": "Your text string goes here",
  "model": "text-embedding-ada-002"
}'}
10:00AM INF Loading model 'all-MiniLM-L6-v2' with backend sentencetransformers
10:00AM DBG Loading model in memory from file: /models/all-MiniLM-L6-v2
10:00AM DBG Loading Model all-MiniLM-L6-v2 with gRPC (file: /models/all-MiniLM-L6-v2) (backend: sentencetransformers): {backendString:sentencetransformers model:all-MiniLM-L6-v2 threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc0003618c8 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh mamba:/build/backend/python/mamba/run.sh openvoice:/build/backend/python/openvoice/run.sh parler-tts:/build/backend/python/parler-tts/run.sh petals:/build/backend/python/petals/run.sh rerankers:/build/backend/python/rerankers/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
10:00AM DBG Loading external backend: /build/backend/python/sentencetransformers/run.sh
10:00AM DBG Loading GRPC Process: /build/backend/python/sentencetransformers/run.sh
10:00AM DBG GRPC Service for all-MiniLM-L6-v2 will be running at: '127.0.0.1:46643'
10:00AM DBG GRPC Service state dir: /tmp/go-processmanager35659977
10:00AM DBG GRPC Service Started
10:00AM DBG GRPC(all-MiniLM-L6-v2-127.0.0.1:46643): stdout Initializing libbackend for build
10:00AM DBG GRPC(all-MiniLM-L6-v2-127.0.0.1:46643): stdout virtualenv activated
10:00AM DBG GRPC(all-MiniLM-L6-v2-127.0.0.1:46643): stdout activated virtualenv has been ensured
10:00AM DBG GRPC(all-MiniLM-L6-v2-127.0.0.1:46643): stderr /build/backend/python/sentencetransformers/venv/lib/python3.10/site-packages/transformers/utils/hub.py:124: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
10:00AM DBG GRPC(all-MiniLM-L6-v2-127.0.0.1:46643): stderr   warnings.warn(
10:00AM DBG GRPC(all-MiniLM-L6-v2-127.0.0.1:46643): stderr Server started. Listening on: 127.0.0.1:46643
10:00AM DBG GRPC Service Ready
10:00AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:all-MiniLM-L6-v2 ContextSize:512 Seed:1854870421 NBatch:512 F16Memory:false MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:false NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:4 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/all-MiniLM-L6-v2 Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false}
10:00AM DBG GRPC(all-MiniLM-L6-v2-127.0.0.1:46643): stderr No sentence-transformers model found with name sentence-transformers/all-MiniLM-L6-v2. Creating a new one with MEAN pooling.
10:00AM DBG GRPC(all-MiniLM-L6-v2-127.0.0.1:46643): stderr /build/backend/python/sentencetransformers/venv/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
10:00AM DBG GRPC(all-MiniLM-L6-v2-127.0.0.1:46643): stderr   warnings.warn(
10:01AM ERR Server error error="could not load model (no success): Unexpected err=OSError(\"We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like sentence-transformers/all-MiniLM-L6-v2 is not the path to a directory containing a file named config.json.\\nCheckout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.\"), type(err)=<class 'OSError'>" ip=172.17.0.1 latency=26.053923196s method=POST status=500 url=/embeddings
10:01AM INF Success ip=127.0.0.1 latency="54.145µs" method=GET status=200 url=/readyz
10:02AM INF Success ip=127.0.0.1 latency="19.401µs" method=GET status=200 url=/readyz
10:03AM INF Success ip=127.0.0.1 latency="17.191µs" method=GET status=200 url=/readyz

I am running local-ai in a network without internet connection. Does anyone know what's problem is that and how to fix it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preload model "text-embedding-ada-002" does not work in local-ai:latest-aio-gpu-nvidia-cuda-12 #2621

{{title}}

Replies: 0 comments

Select a reply

Preload model "text-embedding-ada-002" does not work in local-ai:latest-aio-gpu-nvidia-cuda-12 #2621

mijq Jun 21, 2024

Replies: 0 comments

mijq
Jun 21, 2024