An enhanced Python-based video transcription tool that converts MP4 videos to markdown transcripts and automatically generates comprehensive learning articles using AI.
- π₯ Automatic Video Processing: Monitors input folder for new MP4 files
- π΅ Audio Extraction: Converts MP4 videos to MP3 audio (with fallback for direct video transcription)
- π€ AI Transcription: Uses AssemblyAI for high-quality speech-to-text conversion
- π Learning Article Generation: Uses Google Gemini AI to create structured educational articles
- π Dual Output: Generates both raw transcripts and enhanced learning articles
- ποΈ Smart File Management: Automatically deletes processed videos after successful completion
- π° Cost Tracking: Monitors and logs API usage costs for both services
- π₯οΈ Cross-Platform: Works on Windows, macOS, and Linux
- β‘ Automated Workflow: Drop video β get transcript + learning article
- π‘οΈ Error Handling: Graceful handling of missing dependencies and API errors
audio_learning_article/
βββ video_transcriber/
β βββ InputVideos/ # Place MP4 files here (auto-deleted after processing)
β βββ OutputMarkdown/ # Raw transcripts from AssemblyAI
β βββ LearningArticles/ # Enhanced learning articles from Gemini AI
β βββ Scripts/ # Additional scripts (if any)
β βββ process_videos.py # Main Python script
β βββ article_generator.py # Gemini AI article generation module
β βββ run_transcriber.bat # Windows batch file runner
β βββ run_transcriber.sh # Bash script runner
β βββ file_watcher.ps1 # PowerShell file watcher for automation
βββ .gitignore
βββ README.md
- Python 3.7+ installed on your system
- AssemblyAI API Key (sign up at AssemblyAI)
- Google Gemini API Key (sign up at Google AI Studio)
- Required Python packages:
pip install assemblyai google-generativeai pydub
-
Clone the repository:
git clone <repository-url> cd audio_learning_article
-
Install dependencies:
pip install assemblyai pydub
-
Set up your AssemblyAI API key:
- Edit
run_transcriber.batorrun_transcriber.sh - Replace
YOUR_ASSEMBLYAI_API_KEYwith your actual API key - Or set the environment variable
ASSEMBLYAI_API_KEY
- Edit
Windows:
cd video_transcriber
.\run_transcriber.batLinux/macOS:
cd video_transcriber
bash run_transcriber.shDirect Python:
cd video_transcriber
python process_videos.pyFor automatic processing when files are added to the input folder, you can set up a Windows Task Scheduler task:
-
Open Task Scheduler:
- Press
Win + R, typetaskschd.msc, and press Enter - Or search for "Task Scheduler" in the Start menu
- Press
-
Create a New Task:
- In the right panel, click "Create Task..."
- Name:
Audio Learning Article Transcriber - Description:
Automatically transcribe videos when added to input folder - Check "Run whether user is logged on or not"
- Check "Run with highest privileges"
-
Configure Triggers:
- Go to the "Triggers" tab
- Click "New..."
- Begin the task: "On an event"
- Settings:
- Log:
System - Source:
Microsoft-Windows-Kernel-File - Event ID:
11(file creation)
- Log:
- Click "OK"
-
Configure Actions:
- Go to the "Actions" tab
- Click "New..."
- Action: "Start a program"
- Program/script:
cmd.exe - Add arguments:
/c "cd /d "D:\vscode_projects\challenges\audio_learning_article\video_transcriber" && run_transcriber.bat" - Note: Replace the path with your actual project path
- Click "OK"
-
Configure Conditions (Optional):
- Go to the "Conditions" tab
- Uncheck "Start the task only if the computer is on AC power" (for laptops)
- Check "Wake the computer to run this task" if desired
-
Configure Settings:
- Go to the "Settings" tab
- Check "Allow task to be run on demand"
- Check "Run task as soon as possible after a scheduled start is missed"
- If task fails, restart every:
1 minute - Attempt to restart up to:
3 times
-
Save the Task:
- Click "OK"
- Enter your Windows password when prompted
For more precise and reliable file monitoring, use the included PowerShell file watcher script:
To use the PowerShell file watcher:
- Open PowerShell as Administrator
- Set execution policy (one-time setup):
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
- Navigate to the video_transcriber folder:
cd "D:\vscode_projects\challenges\audio_learning_article\video_transcriber" - Run the file watcher:
.\file_watcher.ps1
Features of the PowerShell file watcher:
- Smart file detection: Waits for files to be fully copied before processing
- File lock checking: Ensures files aren't still being written to
- Colored output: Easy-to-read status messages
- Error handling: Graceful handling of transcription errors
- Automatic path detection: No need to edit paths in the script
The file watcher will:
- Monitor the InputVideos folder for new MP4 files
- Wait for files to be completely copied
- Automatically start transcription when ready
- Display progress and status messages
- Continue monitoring until you press Ctrl+C
- Drop MP4 files in
video_transcriber/InputVideos/ - Choose execution method:
- Manual: Run the transcriber using one of the methods above
- Automatic: Files will be processed automatically if you've set up the scheduler/file watcher
- The enhanced script will:
- β Extract audio from each MP4 file
- β Send audio to AssemblyAI for transcription
- β
Generate raw transcript in
OutputMarkdown/ - β Send transcript to Google Gemini AI for article generation
- β
Create structured learning article in
LearningArticles/ - β Delete original MP4 file (only after successful completion)
- β Clean up temporary audio files
- β Log processing costs and metadata
Output Files:
OutputMarkdown/[filename].md- Raw transcript from AssemblyAILearningArticles/[filename]_article.md- Enhanced learning article from Gemini
Safety Features:
- Original video files are only deleted after BOTH transcription AND article generation succeed
- If any step fails, the original video is preserved
- Detailed error logging and cost tracking
You need to configure both API keys as environment variables for security:
Method 1: Environment Variables (Recommended)
# Linux/macOS
export ASSEMBLYAI_API_KEY="your_assemblyai_key_here"
export GEMINI_API_KEY="your_gemini_key_here"
# Windows Command Prompt
set ASSEMBLYAI_API_KEY=your_assemblyai_key_here
set GEMINI_API_KEY=your_gemini_key_here
# Windows PowerShell
$env:ASSEMBLYAI_API_KEY="your_assemblyai_key_here"
$env:GEMINI_API_KEY="your_gemini_key_here"Method 2: Add to Shell Profile (Persistent)
# Add to ~/.bashrc, ~/.zshrc, or ~/.profile
echo 'export ASSEMBLYAI_API_KEY="your_assemblyai_key_here"' >> ~/.bashrc
echo 'export GEMINI_API_KEY="your_gemini_key_here"' >> ~/.bashrc
source ~/.bashrcMethod 3: Windows System Environment Variables
- Open System Properties β Advanced β Environment Variables
- Add new user variables:
ASSEMBLYAI_API_KEY= your_assemblyai_key_hereGEMINI_API_KEY= your_gemini_key_here
- Input: MP4 video files
- Output:
- Raw transcripts: Markdown (.md) files
- Learning articles: Enhanced markdown (.md) files with structured content
- Temporary: MP3 audio files (automatically cleaned up)
The system tracks costs for both APIs:
- AssemblyAI: ~$0.37 per hour of audio
- Google Gemini 1.5 Flash: ~$0.075 per 1M input tokens, $0.30 per 1M output tokens
- Cost estimates are logged with each processing session
-
"No module named 'pyaudioop'":
- This is a known issue with Python 3.13
- The script will fallback to direct video transcription
- AssemblyAI can handle video files directly
-
"Python not found":
- Ensure Python is installed and added to your PATH
- Try using
python3instead ofpython
-
"ASSEMBLYAI_API_KEY not set":
- Make sure you've set your API key as described in the setup section
-
Permission errors:
- Ensure you have write permissions in the project directory
- On Linux/macOS, you might need to make the shell script executable:
chmod +x run_transcriber.sh
If you encounter issues with pydub, you might need additional system dependencies:
Windows:
- No additional dependencies required
macOS:
brew install ffmpegLinux:
sudo apt-get install ffmpeg- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
This project is open source. Please check the license file for details.
For issues and questions:
- Check the troubleshooting section above
- Review the AssemblyAI documentation
- Create an issue in the repository
Note: This tool requires an active internet connection and a valid AssemblyAI API key to function properly.