I built a voice assistant using Azure Cognitive Speech Services, Azure OpenAI and Langchain Agents. The voice assistant can perform a variety of tasks such as searching the web, answering the weather, controling your home assistant.
The voice assistant use Langchain Agents to perform the tasks. You can easily add more tools by adding tools to load_tools().
You can Add your own tools in the agents/tools.py
Read more about Langchain and Langchain agents/tools here:
https://python.langchain.com/en/latest/modules/agents.html
https://python.langchain.com/docs/modules/tools/custom_tools/
This project is still in development and may contain bugs. Please report any issues you encounter. If You use this project to control your home assistant or do other important things, please be careful.
This project and the author are not responsible for any damage caused by the use of this project.
- Azure account
- Azure Keyword Recognition Model: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/custom-keyword-basics?pivots=programming-language-python
- Azure Speech Service: https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/
- Azure OpenAI Service: https://azure.microsoft.com/en-us/products/cognitive-services/openai-service
- Openweather API: https://openweathermap.org/api
- Google Custom Json Search API: https://developers.google.com/custom-search/v1/overview
- Home Assistant: https://www.home-assistant.io/
Recommended System with Openssl < 3 installed.
Because Azure Speech Service requires Openssl < 3 Yet(2024.04.05).
Refer to This Issue: Azure-Samples/cognitive-services-speech-sdk#2048
And the Azure Speech Service Documentation: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/quickstarts/setup-platform
-
Clone this repository to your local machine.
-
Install the required dependencies:
Install the System packages with Your Package Manager, this is an example for Ubuntu/Debian based systems. You can install the packages with your package manager:
portaudio19-dev python3-pyaudio sox pulseaudio libsox-fmt-all ffmpeg wget libpcre3 libpcre3-dev libatlas-base-dev python3-dev
Install the python packages with pip:
pip install -r requirements.txt
-
Copy the config.example.yaml to config.yaml and fill in the required information.
-
Run the script:
python app.py
- Clone this repository to your local machine.
- Copy the config.example.yml to config.yml and fill in the required information.
- Build the Docker image:
docker build -t moss:latest .
- Run the Docker container:
docker run -itd \
--device /dev/snd \
--name moss \
-e PULSE_SERVER=unix:${XDG_RUNTIME_DIR}/pulse/native \
-v ${XDG_RUNTIME_DIR}/pulse/native:${XDG_RUNTIME_DIR}/pulse/native \
-v ~/.config/pulse/cookie:/root/.config/pulse/cookie \
-v ./config/config.yml:/moss/config/config.yml \
-v /etc/localtime:/etc/localtime:ro \
--restart unless-stopped \
moss:latest
Once the voice assistant is running, it will continuously listen for the wakeup keyword "莫斯"(In Chinese) or "Moss"(In English). Once the keyword is detected, it will start listening for speech input, which it will then pass to the OpenAI model for processing. The model's response will be spoken out loud using Edge-TTS.
The voice assistant can be extended by adding more tools to the load_tools()
function in agents/tools.py
. I use StructuredTool to add tools to the voice assistant.
You can also use class to add tools to the voice assistant, that each tool should be a class that inherits from BaseTool
and implements the run
method. The run
method should return a string that will be spoken out loud by the voice assistant.
Learn more about Langchain Agents and Tools here: https://python.langchain.com/en/latest/modules/agents.html
Once the voice assistant is running, you can use the following example prompts to interact with it:
- "Moss, How It's the weather in Beijing?"
- "Moss, Turn on the light"
It can also respond to any general queries that the GPT model is capable of answering. However, its capabilities extend beyond that.
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the GPL-3.0 License - see the LICENSE file for details.
Thanks to the following repositories for inspiration: