Skip to content

Added Speech Recognition with Hugging Face from Node.js to Python #321

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
117 changes: 117 additions & 0 deletions python/speech_recognition_with_huggingface/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# celery beat schedule file
celerybeat-schedule

# dotenv
.env

# virtualenv
venv/
ENV/
env/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# VS Code
.vscode/

# Appwrite
appwrite/
148 changes: 148 additions & 0 deletions python/speech_recognition_with_huggingface/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
# Speech Recognition with Hugging Face


This function uses the Hugging Face API to perform speech recognition. It takes an audio file from Appwrite storage and sends it to the Hugging Face API for speech recognition. The API returns the text and records it in the database. This function also supports receiving document events from the Appwrite Database.


## 🧰 Usage


### POST /


**Parameters**
| Name | Description | Location | Type | Sample Value |
|------------|-------------|----------|--------|--------------|
| fileId | Appwrite File ID of audio file | Body | String | `65c6319c5f34dc9638ec` |


This function also accepts body of a file event from Appwrite Storage.


**Response**


Sample `200` Response:


Text from the audio file is recognized and stored in the database.


```json
{
"text": " going along slushy country roads and speaking to damp audiences in draughty schoolrooms day after day for a fortnight he'll have to put in an appearance at some place of worship on sunday morning and he can come to us immediately afterwards"
}
```


Sample `404` Response:


```json
{
"error": "File not found"
}
```


## ⚙️ Configuration


| Setting | Value |
| ----------------- | ------------------------------ |
| Runtime | Python (3.12) |
| Entrypoint | `src/main.py` |
| Build Commands | `npm install && npm run setup` |
| Permissions | `any` |
| Timeout (Seconds) | 15 |
| Events | `buckets.*.files.*.create` |


## Prerequisites
- [Appwrite](https://appwrite.io/) account and project
- [Hugging Face](https://huggingface.co/) account and access token


## 🔒 Environment Variables


### APPWRITE_BUCKET_ID


The ID of the bucket where audio is stored.


| Question | Answer |
| ------------ | ------------------- |
| Required | No |
| Sample Value | `speech_recogition` |


### APPWRITE_DATABASE_ID


The ID of the database where the responses are stored.


| Question | Answer |
| ------------ | ------ |
| Required | No |
| Sample Value | `ai` |


### APPWRITE_COLLECTION_ID


The ID of the collection where the responses are stored.


| Question | Answer |
| ------------ | ------------------- |
| Required | No |
| Sample Value | `speech_recogition` |


### HUGGINGFACE_ACCESS_TOKEN


Secret for sending requests to the Hugging Face API.


| Question | Answer |
| ------------- | ----------------------------------------------------------------------------------------------------- |
| Required | Yes |
| Sample Value | `hf_x2a...` |
| Documentation | [Hugging Face: API tokens](https://huggingface.co/docs/api-inference/en/quicktour#get-your-api-token) |




Create a `.env` file in the `src` directory and add the following environment variables:


```properties
APPWRITE_ENDPOINT=http://localhost/v1 <---- set to this if you selected all default values when running appwrite locally through docker
APPWRITE_API_KEY=your_appwrite_api_key
APPWRITE_PROJECT_ID=your_appwrite_project_id
HUGGINGFACE_ACCESS_TOKEN=your_huggingface_access_token
APPWRITE_DATABASE_ID=ai
APPWRITE_COLLECTION_ID=speech_recognition
APPWRITE_BUCKET_ID=speech_recognition
```

## Running the Application

To process an audio file for speech recognition, you can use the `main.py` script. Ensure your Appwrite function is set up to handle HTTP requests and call the `process_audio` function.


## Dependencies

- pip install appwrite
- pip install huggingface_hub
- pip install python-dotenv


## License


This project is licensed under the MIT License.
3 changes: 3 additions & 0 deletions python/speech_recognition_with_huggingface/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
appwrite
huggingface_hub
python-dotenv
Binary file not shown.
Binary file not shown.
Loading