Merge pull request #98 from premAI-io/mudler-patch-1

casperdcl · web-flow · commit 3fe85dbb6e42 · 2023-11-05T23:13:15.000Z
mlops-engines: add LocalAI
diff --git a/desktop-apps.md b/desktop-apps.md
@@ -208,7 +208,7 @@ koboldcpp Julius Model Configuration
 
 [local.ai]: https://www.localai.app
 
-The [local.ai] App from https://github.com/louisgv/local.ai ([not to be confused](https://github.com/louisgv/local.ai/discussions/71) with [LocalAI](https://localai.io) from https://github.com/mudler/LocalAI) is a simple application for loading LLMs after you manually download a `ggml` model from online.
+The [local.ai] App from https://github.com/louisgv/local.ai ([not to be confused](https://github.com/louisgv/local.ai/discussions/71) with [](mlops-engines.md#localai) from https://github.com/mudler/LocalAI) is a simple application for loading LLMs after you manually download a `ggml` model from online.
 
 ### UI and Chat
 
diff --git a/mlops-engines.md b/mlops-engines.md
@@ -29,6 +29,7 @@ Inference Engine | Open-Source | GPU optimisations | Ease of use
 [](#vllm) | 🟢 Yes | Continuous Batching, Tensor Parallelism, Paged Attention | 🟢 Easy
 [](#bentoml) | 🟢 Yes | None | 🟢 Easy
 [](#modular) | 🔴 No | N/A | 🟡 Moderate
+[](#localai) | 🟢 Yes | 🟢 Yes | 🟢 Easy
 ```
 
 {{ table_feedback }}
@@ -127,6 +128,20 @@ Cons:
 
 This is not an exhaustive list of MLOps engines by any means. There are many other tools and frameworks developer use to deploy their ML models. There is ongoing development in both the open-source and private sectors to improve the performance of LLMs. It's up to the community to test out different services to see which one works best for their use case.
 
+## LocalAI
+
+[LocalAI](https://localai.io) from https://github.com/mudler/LocalAI ([not to be confused](https://github.com/louisgv/local.ai/discussions/71) with [](desktop-apps.md#localai) from https://github.com/louisgv/local.ai) is the free, Open Source alternative to OpenAI. LocalAI act as a drop-in replacement REST API that's compatible with OpenAI API specifications for local inferencing. It can run LLMs (with various backend such as https://github.com/ggerganov/llama.cpp or [](#vllm)), generate images, generate audio, transcribe audio, and can be self-hosted (on-prem) with consumer-grade hardware.
+
+Pros:
+
+- [wide range of models supported](https://localai.io/model-compatibility)
+- support for [functions](https://localai.io/features/openai-functions) (self-hosted [OpenAI functions](https://platform.openai.com/docs/guides/gpt/function-calling))
+- [easy to integrate](https://localai.io/integrations)
+
+Cons:
+
+- binary version is harder to run and compile locally. https://github.com/mudler/LocalAI/issues/1196.
+- high learning curve due to high degree of customisation
 
 ## Challenges in Open Source
 
diff --git a/model-formats.md b/model-formats.md
@@ -280,7 +280,7 @@ Some [clients & libraries supporting `GGUF`](https://huggingface.co/TheBloke/Lla
 - [LM Studio](https://lmstudio.ai) -- an easy-to-use and powerful local GUI with GPU acceleration on both Windows (NVidia and AMD), and macOS
 
 ```{seealso}
-For more info on `GGUF`, see https://github.com/ggerganov/llama.cpp/pull/2398 and its [spec](https://github.com/philpax/ggml/blob/gguf-spec/docs/gguf.md).
+For more info on `GGUF`, see https://github.com/ggerganov/llama.cpp/pull/2398 and its [spec](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md).
 ```
 
 ### Limitations
diff --git a/sdk.md b/sdk.md
@@ -46,11 +46,10 @@ The list of vector stores that LangChain supports can be found [here](https://ap
 
 ### Models
 
-This is the heart of most LLM models where the core functionality resides. There are broadly 3 different [models](https://docs.langchain.com/docs/components/models) that LLMs provide. They are Language, Chat, and Embedding model.
+This is the heart of most LLMs, where the core functionality resides. There are broadly [2 different types of models](https://python.langchain.com/docs/modules/model_io/models) which LangChain integrates with:
 
 - **Language**: Inputs & outputs are `string`s
 - **Chat**: Run on top of a Language model. Inputs are a list of chat messages, and output is a chat message
-- **Embedding**: Inputs is a `string` and outputs are a list of `float`s (vector)
 
 ### Tools