MLX RAG

Explore a simple example of utilizing MLX for RAG application running locally on your Apple Silicon device.

I have previously converted the weights for the embedding model gte-large into MLX format, and you can find them stored here in the mlx-rag repository. Additionally, as a base model, I am using NeuralBeagle14-7B-4bit-mlx.

Getting started

Install requirements

python3 -m pip install -r requirements.txt

Create vector database from a pdf file

python3 create_vdb.py --pdf flash_attention.pdf --vdb vdb.npz

Query database (pdf file)

python3 query_vdb.py --question "what is flash attention?"

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
create_vdb.py		create_vdb.py
flash_attention.pdf		flash_attention.pdf
model.py		model.py
query_vdb.py		query_vdb.py
requirements.txt		requirements.txt
vdb.py		vdb.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLX RAG

Getting started

About

Releases

Packages

Languages

License

vegaluisjose/mlx-rag

Folders and files

Latest commit

History

Repository files navigation

MLX RAG

Getting started

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages