Skip to content

mdhishaamakhtar/wikiweb-backend

 
 

Repository files navigation

Iris Wikipedia Pathfinder

A high-performance service for discovering shortest paths between Wikipedia pages using optimized graph algorithms

Overview

Iris Wikipedia Pathfinder is a sophisticated web service that implements advanced graph traversal algorithms to find the shortest path between any two Wikipedia pages. Built with modern software architecture principles, the system leverages Redis-based breadth-first search (BFS) algorithms to efficiently navigate Wikipedia's link graph while maintaining scalability and performance.

The project demonstrates expertise in:

  • Domain-Driven Design: Clean separation between API, business logic, and infrastructure layers
  • Distributed Systems: Redis-based queuing and caching for horizontal scalability
  • Asynchronous Processing: Celery task queues for non-blocking pathfinding operations
  • Algorithm Optimization: Memory-efficient BFS implementation using external storage
  • Production-Ready Architecture: Comprehensive error handling, monitoring, and deployment automation

Core Features

✅ Pathfinding Algorithms

  • Redis-Based BFS: Memory-efficient pathfinding using external Redis queues
  • Configurable Depth Limits: Prevents infinite searches with customizable depth constraints
  • Batch Processing: Optimized Wikipedia API usage through intelligent batching

✅ Scalable Architecture

  • Asynchronous Task Processing: Non-blocking operations using Celery workers
  • Distributed Caching: Redis-based caching for Wikipedia API responses
  • Session Isolation: Concurrent searches with isolated Redis namespaces
  • Auto-cleanup: Automatic resource cleanup to prevent memory accumulation

✅ Production Features

  • Health Monitoring: Comprehensive system health checks and metrics
  • Error Handling: Structured exception hierarchy with detailed error responses
  • API Validation: Request/response validation using Marshmallow schemas
  • Rate Limiting: Configurable API rate limiting for resource protection
  • CORS Support: Cross-origin resource sharing for frontend integration

✅ Interactive Visualization

  • Web-Based UI: Interactive interface for pathfinding with real-time progress
  • Graph Visualization: D3.js-powered interactive graph with physics simulation
  • Mobile Support: Touch-optimized interface that works on mobile devices
  • State Persistence: Saves progress and resumes interrupted searches
  • Dynamic Features: Drag-and-drop nodes, responsive layout, smart text truncation

✅ Development Tools

  • Comprehensive Testing: Unit and integration tests with 100% pass rate
  • Environment Management: Separate configurations for development, testing, and production
  • CI/CD Ready: GitHub Actions integration for automated testing and deployment

Core Technologies

Python Flask Redis Celery Gunicorn

Frontend & Visualization

D3.js JetBrains Mono Dark Theme Interactive

Development & Testing

pytest Black Coverage Marshmallow

Project Information

License GDSC VIT Documentation

Infrastructure

Wikipedia API Graph Theory

Quick Start

Development Setup (One Command)

# Clone and setup
git clone <repository-url>
cd iris-web-backend

# Create virtual environment  
python3 -m venv env
source env/bin/activate  # On Windows: env\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Start everything (Redis + Flask + Celery)
./dev.sh

The application will be available at:

  • Interactive UI: http://localhost:9020 (default landing page)
  • API Documentation: http://localhost:9020/api

Production Deployment

# Set environment variables
export FLASK_ENV=production
export SECRET_KEY=your-secure-secret-key
export REDIS_URL=redis://localhost:6379/0

# Deploy with startup script
./start.sh

API Documentation

Complete API documentation with examples, request/response schemas, and integration guides is available in API_DOCUMENTATION.md.

Key Endpoints

  • GET / - Interactive UI (default landing page)
  • GET /<any-path> - All non-API paths redirect to main UI
  • POST /getPath - Start pathfinding task (returns task ID for polling)
  • GET /tasks/status/<task_id> - Poll task status with progress updates
  • POST /explore - Discover page connections for graph visualization
  • GET /health - System health monitoring endpoint
  • GET /api - API documentation and information

Architecture Highlights

Redis-Based BFS Algorithm

The core pathfinding algorithm demonstrates advanced system design:

  • Memory Efficiency: Uses Redis queues instead of in-memory data structures
  • Horizontal Scalability: Multiple workers can process different search sessions
  • Session Isolation: Unique Redis namespaces prevent search interference
  • Automatic Cleanup: Resource cleanup prevents Redis memory accumulation

Service Layer Architecture

  • Dependency Injection: Service factory pattern with proper abstractions
  • Interface Segregation: Clear contracts between components
  • Error Propagation: Structured exception handling throughout the stack
  • Configuration Management: Environment-specific settings with validation

Testing & Quality Assurance

# Run comprehensive test suite
pytest -v

# Run with coverage reporting (console + HTML)
pytest --cov=app --cov-report=term-missing --cov-report=html

# Test specific components
pytest tests/unit/ -v      # Unit tests
pytest tests/integration/ -v  # Integration tests

Current test coverage: 107 tests passing with approximately 80% line coverage across the app/ package (see htmlcov/index.html after running coverage for a browsable report).

Key areas covered by new tests:

  • Cache and queue infrastructure with Redis client mocking
  • ServiceFactory lifecycle and Celery task configuration helpers
  • Wikipedia client parsing, batching, and request handling with a fake session
  • API middleware decorators (error handling, CORS, rate limiting, size checks)
  • Logging configuration, including file handler setup for non-testing environments

Contributors

This project was developed by:

Made with ❤️ by DSC VIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 73.1%
  • JavaScript 13.3%
  • CSS 8.0%
  • HTML 4.2%
  • Shell 1.4%