Voice-Powered AI Data Analysis Platform
yDE (Your Data Expert) is an innovative voice agent platform that transforms data analysis through natural conversation. Upload your datasets and have intelligent conversations with an AI analyst that can understand, analyze, and provide insights through voice interaction.
- Your personal Live Data Voice Agent
- CSV-Upload (could be any size)
- Live Transcript, graph visualization, code and results streaming to frontend
- Code Execution for insights
- Defer Data Enrichment Task to the background with live updating Google Sheets and streaming to UI/Agent (currently requires ACI_API_KEY to handle auth with Google Sheets)
(if no ACI_API_KEY it still works, by live updating a pd df and saving to disk)
- Next.js 15.4.2 with React 19
- TypeScript for type safety
- Tailwind CSS for styling
- Pipecat for WebRTC audio handling
- Motion for animations
Backend:
- FastAPI for RESTful API
- Pipecat AI for voice processing pipeline
- WebRTC for real-time audio communication
- Pandas for data manipulation
- OpenAI GPT-4 for natural language processing
- ElevenLabs for text-to-speech synthesis
- aci.dev integration for gmail, google sheets
- Python 3.8+
- Node.js 18+
- npm or yarn
- Modern web browser with WebRTC support
-
Clone the repository
git clone <repository-url> cd my-analyst-hack
-
Run the startup script
chmod +x start.sh ./start.sh
This script will:
- Check prerequisites
- Set up Python virtual environment
- Install dependencies
- Start both backend and frontend servers
-
Access the application
- Frontend: http://localhost:3000
- Backend API: http://localhost:7860
- API Documentation: http://localhost:7860/docs
-
Navigate to backend directory
cd backend -
Create virtual environment
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
-
Configure environment variables
cp .env.example .env
Edit
.envfile with your API keys:ELEVENLABS_API_KEY=your_elevenlabs_api_key OPENAI_API_KEY=your_openai_api_key ELEVENLABS_VOICE_ID=your_preferred_voice_id ACI_API_KEY=(optional) your-aci-dev-api-key
(if no ACI_API_KEY it still works, by live updating a pd df and saving to disk)
-
Start the backend server
python main.py
-
Navigate to frontend directory
cd frontend -
Install dependencies
npm install
-
Start the development server
npm run dev
- Visit OpenAI Platform
- Create an account or sign in
- Navigate to API Keys section
- Create a new API key
- Add to your
.envfile
- Visit ElevenLabs
- Create an account or sign in
- Go to Profile Settings
- Copy your API key
- Add to your
.envfile
For Google Sheets export functionality:
- Set up Google Cloud Project
- Enable Google Sheets API
- Create service account credentials
- Add credentials to backend configuration
-
Upload Your Dataset
- Click "Upload Dataset" on the home screen
- Select a CSV file from your computer
- The system will generate a unique session ID
-
Start Voice Conversation
- Click "Connect" to establish voice connection
- Grant microphone permissions when prompted
- Begin speaking with your AI data analyst
-
Analyze Your Data
- Ask questions like "What's in this dataset?"
- Request specific analysis: "Show me sales trends"
- Ask for visualizations: "Create a chart of revenue by month"
- Request data enrichment: "Classify customer satisfaction levels"
Basic Analysis:
User: "What's in this dataset?"
AI: "I can see you have a sales dataset with 1,000 records. It contains columns for date, product, revenue, customer_id, and region. The data spans from January to December 2023."
User: "Show me the top 5 products by revenue"
AI: "Let me analyze that for you. The top 5 products by revenue are: Product A ($125,000), Product B ($98,000), Product C ($87,000), Product D ($76,000), and Product E ($65,000). I've also created a visualization showing this data."
Advanced Analysis:
User: "Can you identify any seasonal patterns in the sales data?"
AI: "I've analyzed the sales data and found clear seasonal patterns. Sales peak in Q4 (holiday season) with a 40% increase compared to Q1. There's also a summer dip in July-August. I've created a seasonal decomposition chart to visualize these patterns."
User: "Export this analysis to Google Sheets"
AI: "I've uploaded the seasonal analysis to Google Sheets. You can find the breakdown by quarter, monthly trends, and the seasonal decomposition data in separate sheets."
- "Mute/Unmute" - Toggle your microphone
- "Generate Summary" - Create a report and email summary
- "Start with another dataset" - Upload a new file for analysis
POST /api/offer- Establish WebRTC peer connectionGET /api/transcript-events- Stream real-time conversation transcriptsGET /api/enrichment-events- Stream data enrichment progress
POST /api/upload-csv- Upload CSV datasetGET /reports/{filename}- Download generated PDF reports
POST /api/report- Generate PDF report and email summary
- File Upload β CSV stored with session ID
- Voice Connection β WebRTC establishes audio stream
- Speech Recognition β OpenAI STT converts speech to text
- AI Processing β GPT-4 analyzes request and executes code
- Data Analysis β Pandas processes dataset
- Voice Response β ElevenLabs TTS converts response to speech
- Real-time Streaming β Audio and transcripts stream to frontend
Modify voice characteristics in the backend:
tts = ElevenLabsTTSService(
api_key=os.getenv("ELEVENLABS_API_KEY"),
voice_id=os.getenv("ELEVENLABS_VOICE_ID"),
model="eleven_flash_v2_5",
voice_settings={
"stability": 0.5,
"similarity_boost": 0.75,
"style": 0.0,
"use_speaker_boost": True
}
)Customize the AI analyst's behavior by modifying the system prompt in backend/bot.py:
SYSTEM_PROMPT = (
"You are an expert data analyst specializing in [your domain]. "
"Focus on providing actionable insights and business recommendations. "
# Add your custom instructions here
)Modify the frontend styling in frontend/app/globals.css and component files to match your brand colors and design preferences.
-
Environment Configuration
# Set production environment variables export NODE_ENV=production export PYTHON_ENV=production
-
Build Frontend
cd frontend npm run build npm start -
Deploy Backend
cd backend gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker
Create a Dockerfile for containerized deployment:
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 7860
CMD ["python", "main.py"]Audio Connection Problems:
- Ensure microphone permissions are granted
- Check WebRTC support in your browser
- Verify no firewall blocking WebRTC traffic
API Key Errors:
- Verify API keys are correctly set in
.envfile - Check API key permissions and quotas
- Ensure keys are valid and active
File Upload Issues:
- Verify CSV file format is valid
- Check file size limits
- Ensure proper file encoding (UTF-8 recommended)
Performance Issues:
- Monitor API usage and quotas
- Check network connectivity
- Verify sufficient system resources
Enable verbose logging:
python main.py --verboseTest API endpoints:
curl http://localhost:7860/api/test- Audio Quality: Adjust WebRTC settings for optimal latency vs quality
- API Usage: Implement caching for repeated analysis requests
- Memory Management: Monitor pandas DataFrame memory usage
- Concurrent Users: Scale backend instances for multiple simultaneous users
- Track API usage and costs
- Monitor WebRTC connection quality
- Log analysis performance metrics
- Monitor user session durations
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
- Follow TypeScript best practices for frontend
- Use Python type hints for backend
- Maintain consistent code formatting
- Add comprehensive documentation for new features
[Add your license information here]
For support and questions:
- Create an issue on GitHub
- Check the troubleshooting section
- Review API documentation at
/docs
yDE Your Data Expert - Transforming data analysis through voice-powered AI conversations.

