Skip to content

Latest commit

 

History

History
351 lines (278 loc) · 7.2 KB

File metadata and controls

351 lines (278 loc) · 7.2 KB

Quick Reference Guide

🚀 Getting Started

Installation

pip install -r requirements.txt
python contract-processor.py

Environment Setup

# Required for OpenAI
export OPENAI_API_KEY="your-api-key"

# Required for Azure OpenAI
export AZURE_ENDPOINT="your-endpoint"
export AZURE_DEPLOYMENT="your-deployment"
export AZURE_API_KEY="your-api-key"

📁 File Structure

contracts-processor-with-OpenAI/
├── contract-processor.py          # Main application
├── database_manager.py            # Database operations
├── settings_manager.py            # Configuration management
├── memory_manager.py              # Memory optimization
├── extraction_templates.py        # AI extraction templates
├── test_system.py                 # System testing
├── sql_schema.sql                 # Database schema
├── requirements.txt               # Dependencies
└── README.md                      # Main documentation

🔧 Configuration

Settings File Location

  • Default: contract_processor_settings.json
  • Custom: Pass path to SettingsManager()

Key Settings

{
  "max_workers": 4,
  "batch_size": 10,
  "memory_limit_mb": 4096,
  "db_type": "postgresql",
  "skip_processed_files": true
}

🗄️ Database

Supported Databases

  • PostgreSQL (recommended for production)
  • MySQL
  • SQLite (default for development)

Quick Database Setup

# PostgreSQL
createdb contract_processor
psql contract_processor < sql_schema.sql

# SQLite (automatic)
python contract-processor.py

Core Tables

  • documents - Main document storage
  • companies - Company information
  • contract_ontology - Categorization system
  • document_relationships - Document links
  • processing_logs - Audit trail
  • file_hashes - Version control

🤖 AI Processing

Supported File Types

  • PDF (.pdf)
  • Word (.docx, .doc)
  • Excel (.xlsx, .xls)

Analysis Templates

  1. Contract Details - Dates, duration, deliverables
  2. Vendor Assessment - Capabilities, risks, compliance
  3. Technical Specifications - Requirements, metrics, timeline

API Configuration

# OpenAI
client = OpenAI(api_key="your-key")

# Azure OpenAI
client = AzureOpenAI(
    azure_endpoint="your-endpoint",
    api_key="your-key",
    api_version="2024-02-01"
)

💾 Memory Management

Configuration

MemoryConfig(
    memory_limit_mb=4096,
    batch_size_adjustment=True,
    min_batch_size=1,
    max_batch_size=50
)

Monitoring

  • Real-time memory usage display
  • Automatic garbage collection
  • Dynamic batch size adjustment
  • Process pool isolation

🔍 Usage Examples

Basic Processing

from contract_processor import EnhancedDocumentProcessorApp

app = EnhancedDocumentProcessorApp()
app.mainloop()

Database Operations

from database_manager import DatabaseManager, DatabaseConfig

config = DatabaseConfig(db_type="sqlite", database="contracts")
db_manager = DatabaseManager(config)
db_manager.initialize()

# Create document
doc, is_new = await db_manager.create_or_update_document(
    file_path=Path("contract.pdf"),
    file_hash="abc123...",
    cw_number="CW-001"
)

Settings Management

from settings_manager import SettingsManager

manager = SettingsManager()
settings = manager.settings

# Update settings
settings.batch_size = 20
manager.save_settings()

Memory Monitoring

from memory_manager import MemoryMonitor, MemoryConfig

config = MemoryConfig(memory_limit_mb=2048)
monitor = MemoryMonitor(config)
monitor.start_monitoring()

info = monitor.get_memory_info()
print(f"Memory: {info['process_rss_mb']:.1f}MB")

🧪 Testing

Run System Tests

python test_system.py

Test Coverage

  • Database connectivity
  • Settings management
  • Memory monitoring
  • Import validation

📊 Output Formats

Excel Export

  • Processed documents summary
  • Contract details extraction
  • Company relationships
  • Processing statistics

Database Queries

-- Get all processed documents
SELECT * FROM documents WHERE processed = true;

-- Get documents by company
SELECT d.* FROM documents d
JOIN document_companies dc ON d.id = dc.document_id
WHERE dc.company_id = 'COMPANY-001';

-- Get processing statistics
SELECT * FROM processing_statistics
WHERE date = CURRENT_DATE;

Knowledge Graph

  • Interactive visualization
  • Document relationships
  • Company connections
  • Ontology categories

🚨 Troubleshooting

Common Issues

Memory Errors

# Reduce batch size in settings
"batch_size": 5
"memory_limit_mb": 2048

API Timeouts

# Increase timeout in settings
"processing_timeout": 600

Database Connection

# Check connection parameters
"db_host": "localhost"
"db_port": 5432
"db_username": "postgres"

File Permissions

# Ensure read access to document directory
chmod 755 /path/to/documents

Debug Mode

import logging
logging.basicConfig(level=logging.DEBUG)

📈 Performance Tips

Optimization

  • Use PostgreSQL for large datasets
  • Increase memory limit for faster processing
  • Use SSD storage for better I/O
  • Monitor API rate limits

Scaling

  • Multiple processing instances
  • Database connection pooling
  • Redis caching (optional)
  • Load balancing

🔐 Security

Best Practices

  • Use environment variables for API keys
  • Regular database backups
  • Monitor access logs
  • Validate file inputs

Configuration

# Secure API key storage
export OPENAI_API_KEY="your-secure-key"

# Database security
"db_password": "strong-password"

📚 API Reference

Main Classes

EnhancedDocumentProcessorApp

  • Main GUI application
  • Document processing workflow
  • Progress tracking
  • Results visualization

DatabaseManager

  • Database operations
  • Document CRUD
  • Relationship management
  • Statistics tracking

SettingsManager

  • Configuration persistence
  • Settings validation
  • Recent directories
  • API configuration

MemoryMonitor

  • System memory tracking
  • Garbage collection
  • Batch optimization
  • Performance monitoring

Key Methods

Document Processing

async def process_document(file_path: Path) -> Dict[str, Any]
async def process_directory(directory: Path) -> List[Dict[str, Any]]

Database Operations

async def create_or_update_document(file_path, file_hash, cw_number)
async def update_document_processing(doc_id, status, results)
async def get_ontology_tree() -> List[Dict[str, Any]]

Settings Management

def load_settings() -> AppSettings
def save_settings()
def get_database_config() -> DatabaseConfig

🆘 Support

Documentation

  • README.md - Main documentation
  • SYSTEM_DOCUMENTATION.md - Complete system guide
  • CONSISTENCY_ANALYSIS.md - Code analysis
  • SQL_SCHEMA_DOCUMENTATION.md - Database schema

Testing

  • test_system.py - System verification
  • Run tests before reporting issues

Logs

  • Check application logs for errors
  • Enable debug logging for troubleshooting
  • Monitor processing statistics

This quick reference provides essential information for using the Contract Processing System. For detailed information, refer to the complete documentation.