You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Run embedding example KernelMemorySaveAndLoad.cs, and after the weights and context generated, I got error: LLamaSharp/LLamaSharp/ggml/src/ggml.c:2703: GGML_ASSERT(ggml_can_mul_mat(a,b)) failed
But if I run chatsession with the model, it works perfectly. So it seems something is wrong with the embedding part of kernelmemory, maybe WithLLamaSharpTextEmbeddingGeneration.
Reproduction Steps
Here is my code:
usingLLama;usingLLama.Common;usingLLamaSharp.KernelMemory;usingMicrosoft.KernelMemory;usingMicrosoft.KernelMemory.Configuration;usingMicrosoft.KernelMemory.DocumentStorage.DevTools;usingMicrosoft.KernelMemory.FileSystem.DevTools;usingMicrosoft.KernelMemory.MemoryStorage.DevTools;usingSystem.Diagnostics;namespaceLSKMRAG{classProgram{staticvoidMain(string[]args){ChatQwenchat=newChatQwen();chat.ChatQwenMain().GetAwaiter().GetResult();;}}publicclassChatQwen{staticstringStorageFolder=>Path.GetFullPath($"./storage-{nameof(ChatQwen)}");staticboolStorageExists=>Directory.Exists(StorageFolder)&&Directory.GetDirectories(StorageFolder).Length>0;stringmodelPath=Path.Combine(Directory.GetCurrentDirectory(),"qwen2.5-3b-instruct-q4_k_m.gguf");publicasyncTaskChatQwenMain(){Console.ForegroundColor=ConsoleColor.Yellow;Console.WriteLine(""" This program uses the Microsoft.KernelMemory package to ingest documents and store the embeddings as local files so they can be quickly recalled when this application is launched again. """);IKernelMemorymemory=CreateMemoryWithLocalStorage(modelPath);Console.ForegroundColor=ConsoleColor.Yellow;if(StorageExists){Console.WriteLine(""" Kernel memory files have been located! Information about previously analyzed documents has been loaded. """);}else{Console.WriteLine(""" Existing kernel memory was not found. Documents will be analyzed (slow) and information saved to disk. Analysis will not be required the next time this program is run. Press ENTER to proceed... """);Console.ReadLine();awaitIngestDocuments(memory);}}privatestaticIKernelMemoryCreateMemoryWithLocalStorage(stringmodelPath){InferenceParamsinfParams=new(){AntiPrompts=["\n\n"]};LLamaSharpConfiglsConfig=new(modelPath){DefaultInferenceParams=infParams};varparameters=newModelParams(modelPath){ContextSize=2048,GpuLayerCount=99,MainGpu=lsConfig.MainGpu,SplitMode=lsConfig.SplitMode};SearchClientConfigsearchClientConfig=new(){MaxMatchesCount=1,AnswerTokens=100,};TextPartitioningOptionsparseOptions=new(){MaxTokensPerParagraph=300,// MaxTokensPerLine = 100,OverlappingTokens=30};SimpleFileStorageConfigstorageConfig=new(){Directory=StorageFolder,StorageType=FileSystemTypes.Disk,};SimpleVectorDbConfigvectorDbConfig=new(){Directory=StorageFolder,StorageType=FileSystemTypes.Disk,};Console.ForegroundColor=ConsoleColor.Blue;Console.WriteLine($"Kernel memory folder: {StorageFolder}");Console.ForegroundColor=ConsoleColor.DarkGray;returnnewKernelMemoryBuilder().WithSimpleFileStorage(storageConfig).WithSimpleVectorDb(vectorDbConfig).WithLLamaSharpDefaults(lsConfig).WithSearchClientConfig(searchClientConfig).With(parseOptions).Build();*/}//... others same as KernelMemorySaveAndLoad.cs}}
Environment & Configuration
Operating system: Win10 / Win11
.NET runtime version: .NET 8
LLamaSharp version:0.23.0
CUDA version (if you are using cuda backend): 12.4 / 12.6
It seems to be related with llama.cpp. Because I found possible related issue in llama.cpp: #12517. And the BUG in llama.cpp just resolved last week in #12545.
The text was updated successfully, but these errors were encountered:
LLamaSharp is currently using this version of llama.cpp (three weeks old). Hopefully this should be resolved once we upgrade to a newer version (no set timeline, but usually around once a month).
Description
Run embedding example KernelMemorySaveAndLoad.cs, and after the weights and context generated, I got error: LLamaSharp/LLamaSharp/ggml/src/ggml.c:2703: GGML_ASSERT(ggml_can_mul_mat(a,b)) failed
But if I run chatsession with the model, it works perfectly. So it seems something is wrong with the embedding part of kernelmemory, maybe WithLLamaSharpTextEmbeddingGeneration.
Reproduction Steps
Here is my code:
Environment & Configuration
Known Workarounds
It seems to be related with llama.cpp. Because I found possible related issue in llama.cpp: #12517. And the BUG in llama.cpp just resolved last week in #12545.
The text was updated successfully, but these errors were encountered: