Bring local inference into web apps !
Download extension | Supported apps | Integration guide | SDK Documentation
This is an experimental project at an MVP stage. Feedback will be greatly appreciated for the future of this project.
AI-Mask is a chrome web extension that serves as a local provider to AI models execution. It runs model on-device for web apps which needs it, for free, and with full-privacy.
See it as the Metamask of AI.
Try it ! Install the extension, then open the Chat app
AIMask-demo.mp4
On-device AI inference is getting quite a traction recently. Most of our devices are already capable of executing machine learning models and software compatibility is ready.
Thanks to some amazing libraries, running machine learning models in the browser has become ridiculously easy, accelerated with WASM and WebGPU. This means they'll work and run nearly at full-performance on virtually any device, hardware and operating system.
But State-of-the-art web inference libraries store models in the browser cache, which have been, for security reason, domain partitionned. This means that if multiple web apps use the same models, it needs to be downloaded once per domain, which can use a lot of disk space.
With this extension, the models are cached only once and served to the websites conveniently though an unified SDK.
This is a test to see if it's interesting and getting traction from users and app developers.
Another major feature planned is also to proxy requests to OpenAI-like APIs. Users would store their API keys in the extension, and apps would query the extension to run models.
This would solve:
- Users won't have to share API keys with non-trusted apps anymore
- Users won't share private data with apps
- App developers won't need to have a backend server which proxies API request to alleviate CORS issues and manipulate responses
Web apps that are compatible with this extension for local inference:
Enjoy free and private execution of AI models !
Do not pay for using models again, do not leak private data, and do not give your API keys to third-party apps.
How To:
Easily support AI-Mask in your AI apps, and bring free and private local-inference to your users ! Do not store API keys again, and get rid of your backend and server costs.
Quick Start
Install package:
npm install -S @ai-mask/sdk
Run inference:
import { AIMaskClient } from '@ai-mask'
const messages = [{ role: 'user', content: 'What is the capital of France ? ' }]
const aiMaskClient = new AIMaskClient()
const response = await aiMaskClient.chat(
{ messages },
{ modelId: 'gemma-2b-it-q4f32_1' },
)
For full reference, see AI-Mask SDK Documentation
You can see the demo app code and an example pull request to see how it's easy to integrate into existing apps
Note: App users must have the extension installed for this to work.
AI-Mask is a ManifestV3 extension, heavily relying on the work of third party libraries to execute model inference:
- web-llm Inference with WASM/WebGPU via Apache TVM
- transformers.js Inference with WASM via ONNX Runtime
Issues with service workers:
- WebGPU is not exposed to service workers
- For some reasons, transformers.js can only run monothreaded in service workers
To solve these issues, the engines runs in an offscreen document
Requirements:
- Node 18+
- pnpm 8+ (for monorepo workspace management)
pnpm dev
pnpm build
- Documentation
- Deploy demo app
- Deploy extension
- SDK Working in web workers
- ReadableStream option
- Bring back computation in service worker from chrome 124 thx to webgpu support
- Proxy OpenAI-like API requests and store user keys
- Create Langchain community libs
- Interrupts
- Include react Hooks/utilities in SDK
- Pull request in one popular AI app
- Implement more tasks
- Add more models
- Unload model from memory after being inactive for a while