pip install -r requirements.txt
python contract-processor.py# Required for OpenAI
export OPENAI_API_KEY="your-api-key"
# Required for Azure OpenAI
export AZURE_ENDPOINT="your-endpoint"
export AZURE_DEPLOYMENT="your-deployment"
export AZURE_API_KEY="your-api-key"contracts-processor-with-OpenAI/
├── contract-processor.py # Main application
├── database_manager.py # Database operations
├── settings_manager.py # Configuration management
├── memory_manager.py # Memory optimization
├── extraction_templates.py # AI extraction templates
├── test_system.py # System testing
├── sql_schema.sql # Database schema
├── requirements.txt # Dependencies
└── README.md # Main documentation
- Default:
contract_processor_settings.json - Custom: Pass path to
SettingsManager()
{
"max_workers": 4,
"batch_size": 10,
"memory_limit_mb": 4096,
"db_type": "postgresql",
"skip_processed_files": true
}- PostgreSQL (recommended for production)
- MySQL
- SQLite (default for development)
# PostgreSQL
createdb contract_processor
psql contract_processor < sql_schema.sql
# SQLite (automatic)
python contract-processor.pydocuments- Main document storagecompanies- Company informationcontract_ontology- Categorization systemdocument_relationships- Document linksprocessing_logs- Audit trailfile_hashes- Version control
- PDF (.pdf)
- Word (.docx, .doc)
- Excel (.xlsx, .xls)
- Contract Details - Dates, duration, deliverables
- Vendor Assessment - Capabilities, risks, compliance
- Technical Specifications - Requirements, metrics, timeline
# OpenAI
client = OpenAI(api_key="your-key")
# Azure OpenAI
client = AzureOpenAI(
azure_endpoint="your-endpoint",
api_key="your-key",
api_version="2024-02-01"
)MemoryConfig(
memory_limit_mb=4096,
batch_size_adjustment=True,
min_batch_size=1,
max_batch_size=50
)- Real-time memory usage display
- Automatic garbage collection
- Dynamic batch size adjustment
- Process pool isolation
from contract_processor import EnhancedDocumentProcessorApp
app = EnhancedDocumentProcessorApp()
app.mainloop()from database_manager import DatabaseManager, DatabaseConfig
config = DatabaseConfig(db_type="sqlite", database="contracts")
db_manager = DatabaseManager(config)
db_manager.initialize()
# Create document
doc, is_new = await db_manager.create_or_update_document(
file_path=Path("contract.pdf"),
file_hash="abc123...",
cw_number="CW-001"
)from settings_manager import SettingsManager
manager = SettingsManager()
settings = manager.settings
# Update settings
settings.batch_size = 20
manager.save_settings()from memory_manager import MemoryMonitor, MemoryConfig
config = MemoryConfig(memory_limit_mb=2048)
monitor = MemoryMonitor(config)
monitor.start_monitoring()
info = monitor.get_memory_info()
print(f"Memory: {info['process_rss_mb']:.1f}MB")python test_system.py- Database connectivity
- Settings management
- Memory monitoring
- Import validation
- Processed documents summary
- Contract details extraction
- Company relationships
- Processing statistics
-- Get all processed documents
SELECT * FROM documents WHERE processed = true;
-- Get documents by company
SELECT d.* FROM documents d
JOIN document_companies dc ON d.id = dc.document_id
WHERE dc.company_id = 'COMPANY-001';
-- Get processing statistics
SELECT * FROM processing_statistics
WHERE date = CURRENT_DATE;- Interactive visualization
- Document relationships
- Company connections
- Ontology categories
Memory Errors
# Reduce batch size in settings
"batch_size": 5
"memory_limit_mb": 2048API Timeouts
# Increase timeout in settings
"processing_timeout": 600Database Connection
# Check connection parameters
"db_host": "localhost"
"db_port": 5432
"db_username": "postgres"File Permissions
# Ensure read access to document directory
chmod 755 /path/to/documentsimport logging
logging.basicConfig(level=logging.DEBUG)- Use PostgreSQL for large datasets
- Increase memory limit for faster processing
- Use SSD storage for better I/O
- Monitor API rate limits
- Multiple processing instances
- Database connection pooling
- Redis caching (optional)
- Load balancing
- Use environment variables for API keys
- Regular database backups
- Monitor access logs
- Validate file inputs
# Secure API key storage
export OPENAI_API_KEY="your-secure-key"
# Database security
"db_password": "strong-password"EnhancedDocumentProcessorApp
- Main GUI application
- Document processing workflow
- Progress tracking
- Results visualization
DatabaseManager
- Database operations
- Document CRUD
- Relationship management
- Statistics tracking
SettingsManager
- Configuration persistence
- Settings validation
- Recent directories
- API configuration
MemoryMonitor
- System memory tracking
- Garbage collection
- Batch optimization
- Performance monitoring
Document Processing
async def process_document(file_path: Path) -> Dict[str, Any]
async def process_directory(directory: Path) -> List[Dict[str, Any]]Database Operations
async def create_or_update_document(file_path, file_hash, cw_number)
async def update_document_processing(doc_id, status, results)
async def get_ontology_tree() -> List[Dict[str, Any]]Settings Management
def load_settings() -> AppSettings
def save_settings()
def get_database_config() -> DatabaseConfigREADME.md- Main documentationSYSTEM_DOCUMENTATION.md- Complete system guideCONSISTENCY_ANALYSIS.md- Code analysisSQL_SCHEMA_DOCUMENTATION.md- Database schema
test_system.py- System verification- Run tests before reporting issues
- Check application logs for errors
- Enable debug logging for troubleshooting
- Monitor processing statistics
This quick reference provides essential information for using the Contract Processing System. For detailed information, refer to the complete documentation.