-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Open
Labels
questionFurther information is requestedFurther information is requested
Description
i use this guide for develop a RAG system: https://github.com/marklysze/LlamaIndex-RAG-WSL-CUDA/blob/master/LlamaIndex_Mixtral_8x7B-RAG.ipynb
i use the following code for use my local llm:
llm = LlamaCPP(
model_url=None, # We'll load locally.
# model_path='./Models/mistral-7b-instruct-v0.1.Q6_K.gguf', # 6-bit model
model_path="my_local_llm_path",
temperature=0.1,
max_new_tokens=1024, # Increasing to support longer responses
context_window=8192, # Mistral7B has an 8K context-window
generate_kwargs={},
# set to at least 1 to use GPU
model_kwargs={"n_gpu_layers": 40}, # 40 was a good amount of layers for the RTX 3090, you may need to decrease yours if you have less VRAM than 24GB
# messages_to_prompt=messages_to_prompt,
# completion_to_prompt=completion_to_prompt,
messages_to_prompt=system_prompt,
completion_to_prompt=query_wrapper_prompt,
#system_prompt=system_prompt,
#query_wrapper_prompt=query_wrapper_prompt,
verbose=True
)
Can anyone give me information about LlamaCPP structure? What should i do for use my local model?
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested