A high-performance service for discovering shortest paths between Wikipedia pages using optimized graph algorithms
Iris Wikipedia Pathfinder is a sophisticated web service that implements advanced graph traversal algorithms to find the shortest path between any two Wikipedia pages. Built with modern software architecture principles, the system leverages Redis-based breadth-first search (BFS) algorithms to efficiently navigate Wikipedia's link graph while maintaining scalability and performance.
The project demonstrates expertise in:
- Domain-Driven Design: Clean separation between API, business logic, and infrastructure layers
- Distributed Systems: Redis-based queuing and caching for horizontal scalability
- Asynchronous Processing: Celery task queues for non-blocking pathfinding operations
- Algorithm Optimization: Memory-efficient BFS implementation using external storage
- Production-Ready Architecture: Comprehensive error handling, monitoring, and deployment automation
- Redis-Based BFS: Memory-efficient pathfinding using external Redis queues
- Configurable Depth Limits: Prevents infinite searches with customizable depth constraints
- Batch Processing: Optimized Wikipedia API usage through intelligent batching
- Asynchronous Task Processing: Non-blocking operations using Celery workers
- Distributed Caching: Redis-based caching for Wikipedia API responses
- Session Isolation: Concurrent searches with isolated Redis namespaces
- Auto-cleanup: Automatic resource cleanup to prevent memory accumulation
- Health Monitoring: Comprehensive system health checks and metrics
- Error Handling: Structured exception hierarchy with detailed error responses
- API Validation: Request/response validation using Marshmallow schemas
- Rate Limiting: Configurable API rate limiting for resource protection
- CORS Support: Cross-origin resource sharing for frontend integration
- Web-Based UI: Interactive interface for pathfinding with real-time progress
- Graph Visualization: D3.js-powered interactive graph with physics simulation
- Mobile Support: Touch-optimized interface that works on mobile devices
- State Persistence: Saves progress and resumes interrupted searches
- Dynamic Features: Drag-and-drop nodes, responsive layout, smart text truncation
- Comprehensive Testing: Unit and integration tests with 100% pass rate
- Environment Management: Separate configurations for development, testing, and production
- CI/CD Ready: GitHub Actions integration for automated testing and deployment
# Clone and setup
git clone <repository-url>
cd iris-web-backend
# Create virtual environment
python3 -m venv env
source env/bin/activate # On Windows: env\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Start everything (Redis + Flask + Celery)
./dev.sh
The application will be available at:
- Interactive UI:
http://localhost:9020
(default landing page) - API Documentation:
http://localhost:9020/api
# Set environment variables
export FLASK_ENV=production
export SECRET_KEY=your-secure-secret-key
export REDIS_URL=redis://localhost:6379/0
# Deploy with startup script
./start.sh
Complete API documentation with examples, request/response schemas, and integration guides is available in API_DOCUMENTATION.md.
GET /
- Interactive UI (default landing page)GET /<any-path>
- All non-API paths redirect to main UIPOST /getPath
- Start pathfinding task (returns task ID for polling)GET /tasks/status/<task_id>
- Poll task status with progress updatesPOST /explore
- Discover page connections for graph visualizationGET /health
- System health monitoring endpointGET /api
- API documentation and information
The core pathfinding algorithm demonstrates advanced system design:
- Memory Efficiency: Uses Redis queues instead of in-memory data structures
- Horizontal Scalability: Multiple workers can process different search sessions
- Session Isolation: Unique Redis namespaces prevent search interference
- Automatic Cleanup: Resource cleanup prevents Redis memory accumulation
- Dependency Injection: Service factory pattern with proper abstractions
- Interface Segregation: Clear contracts between components
- Error Propagation: Structured exception handling throughout the stack
- Configuration Management: Environment-specific settings with validation
# Run comprehensive test suite
pytest -v
# Run with coverage reporting (console + HTML)
pytest --cov=app --cov-report=term-missing --cov-report=html
# Test specific components
pytest tests/unit/ -v # Unit tests
pytest tests/integration/ -v # Integration tests
Current test coverage: 107 tests passing with approximately 80% line coverage across the app/
package (see htmlcov/index.html
after running coverage for a browsable report).
Key areas covered by new tests:
- Cache and queue infrastructure with Redis client mocking
- ServiceFactory lifecycle and Celery task configuration helpers
- Wikipedia client parsing, batching, and request handling with a fake session
- API middleware decorators (error handling, CORS, rate limiting, size checks)
- Logging configuration, including file handler setup for non-testing environments
This project was developed by:
Made with ❤️ by DSC VIT