LlamaCPP Usage

i use this guide for develop a RAG system: https://github.com/marklysze/LlamaIndex-RAG-WSL-CUDA/blob/master/LlamaIndex_Mixtral_8x7B-RAG.ipynb

i use the following code for use my local llm:

```
llm = LlamaCPP(
    model_url=None, # We'll load locally.
    # model_path='./Models/mistral-7b-instruct-v0.1.Q6_K.gguf', # 6-bit model
    model_path="my_local_llm_path",
    temperature=0.1,
    max_new_tokens=1024, # Increasing to support longer responses
    context_window=8192, # Mistral7B has an 8K context-window
    generate_kwargs={},
    # set to at least 1 to use GPU
    model_kwargs={"n_gpu_layers": 40}, # 40 was a good amount of layers for the RTX 3090, you may need to decrease yours if you have less VRAM than 24GB
    # messages_to_prompt=messages_to_prompt,
    # completion_to_prompt=completion_to_prompt,
    messages_to_prompt=system_prompt,
    completion_to_prompt=query_wrapper_prompt,
    #system_prompt=system_prompt,
    #query_wrapper_prompt=query_wrapper_prompt,
    verbose=True
)
```
Can anyone give me information about LlamaCPP structure? What should i do for use my local model?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LlamaCPP Usage #1035

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

LlamaCPP Usage #1035

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions