This system exports human-reviewed entity and service edge visualization data to Excel files with embedded progress tracking.
- Excel export with separate sheets for Organizations and Services
- Progress Overview tab with comprehensive dashboard data embedded in Excel workbooks
- User opinion-based clustering that respects human review decisions
- Parallel processing for multiple users
- Timestamped exports for audit trails
- Real-time progress statistics for edge visualization reviews
- Per-user breakdown showing pending vs. completed reviews
- Visual completion percentages and review counts
- Overall progress summary across all users and record types
- Rust 1.70+ installed
- PostgreSQL database with edge visualization tables
- Environment variables configured (see
.env.example)
# Clone the repository
git clone <repository-url>
cd export-opinion
# Build the application
cargo build --releaseCreate a .env file with your database connection details:
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_DB=dataplatform
POSTGRES_USER=postgres
POSTGRES_PASSWORD=your_password_here
RUST_LOG=infoTo run the complete export process:
cargo run --bin exportThis will:
- π Run re-clustering based on user opinions
- π Export data to Excel files for each user
- π Include Progress Overview tab with dashboard data
Each user gets a timestamped Excel file: {user_prefix}_export_{timestamp}.xlsx
The system tracks human reviews in tables like:
{user_prefix}_entity_edge_visualization{user_prefix}_service_edge_visualization
PENDING_REVIEW: Requires human reviewCONFIRMED_MATCH: Human confirmed these are duplicatesCONFIRMED_NON_MATCH: Human confirmed these are NOT duplicates
The final cluster_confirmed_status in exports follows this priority:
- PENDING_REVIEW: Any edge in cluster is pending β entire cluster pending
- CONFIRMED: All edges confirmed OR multi-entity cluster with no edges
- NO_MATCH: Single entity with no edges or no cluster assigned
Each export contains three sheets:
- Overall Progress Summary:
- Total pending and reviewed counts across entity and service records
- Overall completion percentage
- Cross-user statistics
- User Breakdown:
- Per-user statistics for both entity and service reviews
- Individual completion percentages
- Detailed pending/confirmed/non-match counts
- Timestamp: When the export was generated
Entity-level data including:
- Contributor information
- Entity IDs and names
- Cluster assignments and confirmation status
- Duplicate detection flags
Service-level data including:
- Service and organization details
- Location and address information
- Taxonomy term classifications
- Cluster assignments and confirmation status
- Duplicate detection flags
- Pending Review: Edges awaiting human review (
PENDING_REVIEW) - Confirmed Match: Edges marked as true duplicates (
CONFIRMED_MATCH) - Confirmed Non-Match: Edges marked as not duplicates (
CONFIRMED_NON_MATCH)
- Review Percentage: (Confirmed Match + Confirmed Non-Match) / Total Records
- Total Records: All edges requiring review
- Reviewed Count: Edges with human decisions (excluding pending)
Update the user list in main.rs and dashboard.rs:
let users = vec![
("Hannah", "hannah"),
("DrewW", "dreww"),
("NewUser", "newuser"), // Add here
];The system expects these table patterns:
{schema}.{user_prefix}_entity_edge_visualization{schema}.{user_prefix}_service_edge_visualization
With columns:
confirmed_status(VARCHAR)was_reviewed(BOOLEAN)- Entity/service ID columns
The system:
- Fetches user opinions from edge visualization tables
- Filters edges based on review status:
- Keeps:
CONFIRMED_MATCHandPENDING_REVIEWedges - Removes:
CONFIRMED_NON_MATCHedges (breaks connections)
- Keeps:
- Creates new clusters using connected components
- Handles isolated entities with self-referencing cluster records
- Exports timestamped tables with user-opinion-based clustering
Enable detailed logging:
RUST_LOG=debug cargo run --bin exportEmpty Progress Overview: Check database connection and table names
Zero review counts: Verify confirmed_status column has expected values
Export fails: Check database permissions and disk space for Excel files
Missing users: Verify user prefixes match database table naming
- Export process scales with data size (parallel user processing)
- Progress Overview generation typically takes < 5 seconds
- Re-clustering performance depends on edge count and cluster size
- Excel file generation is optimized for large datasets
{user_prefix}_export_{timestamp}.xlsxfiles per user- Progress Overview sheet: Comprehensive review statistics and completion tracking
- Organizations sheet: Entity-level data with cluster assignments
- Services sheet: Service-level data with taxonomy terms and addresses
The system creates export schema tables with timestamps:
{user_prefix}_entity_group_cluster_export_{timestamp}{user_prefix}_service_group_cluster_export_{timestamp}{user_prefix}_entity_edge_visualization_export_{timestamp}{user_prefix}_service_edge_visualization_export_{timestamp}
[Add your license information here]