This script transcribes an audio file using OpenAI's Whisper model and optionally post-processes the transcription with GPT-4o for corrections. The transcription and corrected text are saved to text files.
- Transcribes audio files to text using OpenAI's Whisper model.
- Optionally post-processes the transcription with GPT-4o to correct spelling and punctuation.
- Progress bars for uploading the file.
- Saves the transcription and corrected text to text files.
- MP3 (
.mp3
) - MP4 (
.mp4
) - MPEG (
.mpeg
) - MPGA (
.mpga
) - M4A (
.m4a
) - WAV (
.wav
) - WEBM (
.webm
)
- Node.js (v14 or later)
- npm (Node package manager)
-
Clone the repository:
git clone https://github.com/o-Oby/speech-to-text.git cd speech-to-text
-
Install dependencies:
npm install fs path form-data axios readline-sync openai progress chalk
-
Update API Key:
Open the
transcribe_and_postprocess.js
file and replace the placeholder API key with your actual OpenAI API key.const configuration = new Configuration({ apiKey: 'your-api-key-here', // Replace with your actual API key });
-
Update File Path:
Ensure the file path to your audio file is correct in the
transcribeFile
function.const filePath = path.resolve('path/to/your/audio/file.m4a'); // Replace with your actual file path
-
Run the script:
node transcribe_and_postprocess.js
-
Follow the prompts:
- The script will ask if you want to post-process the transcription with GPT-4o.
- Respond with
yes
orno
.
- transcription.txt: Contains the initial transcription of the audio file.
- corrected_transcription.txt: Contains the corrected transcription (if post-processed with GPT-4o).
- File uploads are currently limited to 25 MB. Ensure your audio file size does not exceed this limit.
This project is licensed under the MIT License