Skip to content

LlamaCPP Usage #1035

@4entertainment

Description

@4entertainment

i use this guide for develop a RAG system: https://github.com/marklysze/LlamaIndex-RAG-WSL-CUDA/blob/master/LlamaIndex_Mixtral_8x7B-RAG.ipynb

i use the following code for use my local llm:

llm = LlamaCPP(
    model_url=None, # We'll load locally.
    # model_path='./Models/mistral-7b-instruct-v0.1.Q6_K.gguf', # 6-bit model
    model_path="my_local_llm_path",
    temperature=0.1,
    max_new_tokens=1024, # Increasing to support longer responses
    context_window=8192, # Mistral7B has an 8K context-window
    generate_kwargs={},
    # set to at least 1 to use GPU
    model_kwargs={"n_gpu_layers": 40}, # 40 was a good amount of layers for the RTX 3090, you may need to decrease yours if you have less VRAM than 24GB
    # messages_to_prompt=messages_to_prompt,
    # completion_to_prompt=completion_to_prompt,
    messages_to_prompt=system_prompt,
    completion_to_prompt=query_wrapper_prompt,
    #system_prompt=system_prompt,
    #query_wrapper_prompt=query_wrapper_prompt,
    verbose=True
)

Can anyone give me information about LlamaCPP structure? What should i do for use my local model?

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions