MCP-based Knowledge Graph Construction System

A fully automated knowledge graph construction system built on the Model Context Protocol (MCP), implementing a sophisticated 3-stage data processing pipeline for intelligent knowledge extraction and graph generation.

Overview

This project implements an advanced knowledge graph construction system that automatically processes raw text data through three intelligent stages:

Data Quality Assessment - Evaluates completeness, consistency, and relevance
Knowledge Completion - Enhances low-quality data using LLM and external knowledge bases
Knowledge Graph Construction - Builds structured knowledge graphs with confidence scoring

The system is built on the MCP (Model Context Protocol) architecture, providing a clean client-server interface for seamless integration and scalability.

Key Features

Fully Automated Processing

Zero Manual Intervention: Automatically detects data quality and processing needs
Intelligent Pipeline: Adapts processing strategy based on input data characteristics
Real-time Processing: Immediate knowledge graph generation from raw text

3-Stage Processing Pipeline

Stage 1: Data Quality Assessment

Completeness Analysis: Evaluates entity and relationship coverage
Consistency Checking: Detects semantic conflicts and contradictions
Relevance Scoring: Assesses information relevance and meaningfulness
Quality Threshold: Automatically determines if data needs enhancement

Stage 2: Knowledge Completion (for low-quality data)

Entity Enhancement: Completes missing entity information
Relationship Inference: Adds missing relationships between entities
Conflict Resolution: Corrects semantic inconsistencies
Format Normalization: Standardizes data format and structure
Implicit Knowledge Inference: Extracts hidden knowledge patterns

Stage 3: Knowledge Graph Construction

Rule-based Extraction: Fast, deterministic triple generation
LLM-enhanced Processing: Advanced semantic understanding and relationship inference
Confidence Scoring: Assigns reliability scores to extracted knowledge
Interactive Visualization: Generates beautiful HTML visualizations

MCP Architecture

Client-Server Design: Clean separation of concerns
Standardized Protocol: Built on MCP for interoperability
Tool-based Interface: Modular, extensible tool system
Async Processing: High-performance asynchronous operations

Requirements

Python: 3.11 or higher
UV Package Manager: For dependency management
OpenAI-compatible API: For LLM integration (DeepSeek, OpenAI, etc.)

Quick Start

1. Clone and Setup

git clone https://github.com/turambar928/MCP_based_KG_construction.git
cd MCP_based_KG_construction

# Install dependencies
uv sync

2. Environment Configuration

Create a .env file with your API configuration:

OPENAI_API_KEY=your_api_key_here
OPENAI_BASE_URL=https://api.siliconflow.cn/v1  # or your preferred endpoint
OPENAI_MODEL=Qwen/QwQ-32B                      # or your preferred model

Supported API Providers:

OpenAI: https://api.openai.com/v1
DeepSeek: https://api.deepseek.com
SiliconFlow: https://api.siliconflow.cn/v1
Any OpenAI-compatible endpoint

3. Start the MCP Server

uv run kg_server.py

The server will start and listen for MCP client connections.

4. Running Tests

There are three ways to test the system:

a. Using MCP Inspector

npx -y @modelcontextprotocol/inspector uv run kg_server.py

After running this command, click the link that appears after "MCP Inspector is up and running at" to open the MCP Inspector in your browser. Once opened:

Click "Connect"
Select "Tools" from the top menu
Choose "build_knowledge_graph" from the list tools
Enter your text in the left panel to generate the knowledge graph

b. Using Client Code

uv run kg_client.py

After the connection is successful, enter your text to view the results.

c. Using Mainstream MCP Tools (Cursor, Cherry Studio, etc.)

Example: Running in Cherry Studio

In settings, select MCP servers, click "Add Server" (import from JSON). Here's the configuration JSON (make sure to modify the local path):

{
  "mcpServers": {
    "kg_server": {
      "command": "uv",
      "args": [
        "--directory",
        "D:/mcp_getting_started",
        "run",
        "kg_server.py"
      ],
      "env": {},
      "disabled": false,
      "autoApprove": []
    }
  }
}

After enabling this MCP server, you can use it in Cherry Studio.

🛠️ Usage Guide

Interactive Client Commands

Once the client is running, you can use these commands:

# Build knowledge graph from text
build <your_text_here>

# Example usage
build 北京大学是中国著名的高等教育机构，位于北京市海淀区

# Run demonstration examples
demo

# Exit the client
quit

Programmatic Usage

from kg_client import KnowledgeGraphClient

async def main():
    client = KnowledgeGraphClient()
    await client.connect_to_server()

    # Build knowledge graph
    result = await client.build_knowledge_graph(
        "苹果公司的CEO是蒂姆·库克",
        output_file="my_graph.html"
    )

    print(f"Generated graph: {result}")
    await client.cleanup()

Example Outputs

High-Quality Input

Input: "北京大学是中国著名的高等教育机构，位于北京市海淀区。"
Processing: Direct Stage 3 (high quality detected)
Output:
- Entities: [北京大学, 中国, 高等教育机构, 北京市, 海淀区]
- Triples: [(北京大学, 是, 高等教育机构), (北京大学, 位于, 海淀区), ...]
- Visualization: Interactive HTML graph

Low-Quality Input (Incomplete)

Input: "李华去巴黎"
Processing:
- Stage 1: Detects incomplete information
- Stage 2: Enhances with "巴黎位于法国", "李华是人"
- Stage 3: Builds enhanced knowledge graph
Output: Enriched knowledge graph with inferred relationships

Low-Quality Input (Conflicting)

Input: "巴黎市是德国城市。"
Processing:
- Stage 1: Detects semantic conflict
- Stage 2: Corrects to "巴黎是法国城市"
- Stage 3: Builds corrected knowledge graph
Output: Corrected and enhanced knowledge graph

MCP Tools API

The system exposes the following MCP tools for integration:

`build_knowledge_graph`

Description: Complete pipeline for knowledge graph construction with automatic quality assessment and enhancement.

Parameters:

text (string): Input text to process
output_file (string, optional): HTML visualization output filename (default: "knowledge_graph.html")

Returns: JSON object containing:

success (boolean): Processing success status
entities (array): Extracted entities
triples (array): Generated knowledge triples
confidence_scores (array): Confidence scores for each triple
visualization_file (string): Path to generated HTML visualization
processing_stages (object): Details of each processing stage

Example:

{
  "success": true,
  "entities": ["北京大学", "中国", "高等教育机构"],
  "triples": [
    {
      "subject": "北京大学",
      "predicate": "是",
      "object": "高等教育机构",
      "confidence": 0.95
    }
  ],
  "visualization_file": "knowledge_graph.html"
}

Project Structure

├── kg_server.py              # Main MCP server implementation
├── kg_client.py              # Interactive client for testing
├── kg_utils.py               # Core knowledge graph construction utilities
├── kg_visualizer.py          # HTML visualization generator
├── data_quality.py           # Stage 1: Data quality assessment
├── knowledge_completion.py   # Stage 2: Knowledge completion and enhancement
├── pyproject.toml            # Project dependencies and configuration
├── .env                      # Environment variables (API keys)
└── README.md                 # This file

Core Components

kg_server.py: MCP server that orchestrates the 3-stage pipeline
kg_client.py: Command-line client for interactive testing and batch processing
kg_utils.py: Knowledge graph construction engine with rule-based and LLM-enhanced extraction
kg_visualizer.py: Generates interactive HTML visualizations using Plotly
data_quality.py: Implements quality assessment algorithms for completeness, consistency, and relevance
knowledge_completion.py: Handles knowledge enhancement and conflict resolution

Advanced Features

Quality Assessment Metrics

Completeness Score: Based on entity coverage and relationship density
Consistency Score: Detects semantic conflicts and contradictions
Relevance Score: Evaluates information meaningfulness
Composite Quality Score: Weighted combination of all metrics

Knowledge Enhancement Strategies

Entity Completion: Adds missing entity attributes and types
Relationship Inference: Discovers implicit relationships
Conflict Resolution: Corrects factual inconsistencies
Format Normalization: Standardizes entity and relationship representations

Visualization Features

Interactive Network Graph: Clickable nodes and edges
Entity Clustering: Groups related entities by type
Confidence Visualization: Color-coded confidence levels
Export Options: HTML, PNG, SVG formats

Technical Details

Processing Pipeline

Input Validation: Checks text format and encoding
Quality Assessment: Multi-dimensional quality scoring
Conditional Enhancement: Applies enhancement only when needed
Graph Construction: Rule-based + LLM hybrid approach
Confidence Calculation: Bayesian confidence scoring
Visualization Generation: Interactive HTML output

Performance Characteristics

Processing Speed: ~1-3 seconds per text input
Memory Usage: ~50-100MB for typical workloads
Scalability: Async architecture supports concurrent processing
Accuracy: 85-95% entity extraction, 80-90% relationship accuracy

Development

Running Tests

Refer to the "Running Tests" section above for three different testing methods:

MCP Inspector (recommended for visual testing)
Client code (for programmatic testing)
Mainstream MCP tools (for integration testing)

# Quick test with demonstration examples
uv run kg_client.py
# Then type: demo

# Test with custom input
uv run kg_client.py "Your test text here"

Adding New Features

Custom Quality Metrics: Extend data_quality.py
New Enhancement Strategies: Modify knowledge_completion.py
Additional Visualization: Enhance kg_visualizer.py
New MCP Tools: Add tools to kg_server.py

Configuration Options

Environment variables in .env:

# Required
OPENAI_API_KEY=your_api_key
OPENAI_BASE_URL=your_api_endpoint
OPENAI_MODEL=your_model_name

# Optional
QUALITY_THRESHOLD=0.5          # Quality threshold for enhancement
MAX_ENTITIES=50                # Maximum entities per graph
VISUALIZATION_WIDTH=1200       # HTML visualization width
VISUALIZATION_HEIGHT=800       # HTML visualization height

Contributing

Fork the repository
Create a feature branch: git checkout -b feature-name
Make your changes and test thoroughly
Submit a pull request with detailed description

Troubleshooting

Common Issues

Port Occupation Error

# Find process using the port
netstat -ano | findstr :6277
# Kill the process
taskkill /PID <process_id> /F

API Balance Insufficient
- Check API configuration in .env file
- Ensure API account has sufficient balance
Dependency Installation Issues
```
uv sync --reinstall
```

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Built on the Model Context Protocol (MCP)
Visualization powered by Plotly
Graph algorithms using NetworkX
LLM integration via OpenAI API

Support

For questions, issues, or contributions:

📧 Email: tzf9282003@163.com
🐛 Issues: GitHub Issues
📖 Documentation: See KNOWLEDGE_GRAPH_README.md for detailed technical documentation

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.idea		.idea
.vscode		.vscode
content_enhancement		content_enhancement
data		data
demo_images		demo_images
evaluate_kg		evaluate_kg
exps		exps
logs		logs
paper1		paper1
paper2		paper2
rule_generate_scripts		rule_generate_scripts
scripts		scripts
tests		tests
.DS_Store		.DS_Store
.env		.env
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
KG_quality_enhance.md		KG_quality_enhance.md
LLM_entity_extract.py		LLM_entity_extract.py
README.md		README.md
README_cn.md		README_cn.md
USAGE_GOVERNMENT_KG_EVALUATION.md		USAGE_GOVERNMENT_KG_EVALUATION.md
USAGE_NON_ENHANCED_CSV.md		USAGE_NON_ENHANCED_CSV.md
analyze_processing_results.py		analyze_processing_results.py
bulk_jsonl_to_csv_enhanced.py		bulk_jsonl_to_csv_enhanced.py
bulk_jsonl_to_csv_flexible.py		bulk_jsonl_to_csv_flexible.py
bulk_jsonl_to_csv_nodes_rels.py		bulk_jsonl_to_csv_nodes_rels.py
bulk_jsonl_to_neo4j.py		bulk_jsonl_to_neo4j.py
conclusion_improvements_summary.md		conclusion_improvements_summary.md
data_quality.py		data_quality.py
enhanced_kg.html		enhanced_kg.html
enhanced_knowledge_graph.cypher		enhanced_knowledge_graph.cypher
enhanced_knowledge_graph.html		enhanced_knowledge_graph.html
enhanced_knowledge_graph_triples.tsv		enhanced_knowledge_graph_triples.tsv
enhanced_knowledge_graph_triples.txt		enhanced_knowledge_graph_triples.txt
fetch_finance_env_datasets.py		fetch_finance_env_datasets.py
flowcharts_mermaid.md		flowcharts_mermaid.md
flowcharts_mermaid_improved.md		flowcharts_mermaid_improved.md
generate_large_datasets.py		generate_large_datasets.py
generate_low_quality_dataset.py		generate_low_quality_dataset.py
high_priority_corrections_summary.md		high_priority_corrections_summary.md
kg_client.py		kg_client.py
kg_client_enhanced.py		kg_client_enhanced.py
kg_example.py		kg_example.py
kg_server.py		kg_server.py
kg_server_enhanced.py		kg_server_enhanced.py
kg_utils.py		kg_utils.py
kg_visualizer.py		kg_visualizer.py
knowledge_completion.py		knowledge_completion.py
knowledge_graph.html		knowledge_graph.html
mathematical_framework_explanation.md		mathematical_framework_explanation.md
monitor_progress.py		monitor_progress.py
off.cypher		off.cypher
off_triples.tsv		off_triples.tsv
paper.tex		paper.tex
paper1_reference_check.md		paper1_reference_check.md
paper2_experiment_improvements_summary.md		paper2_experiment_improvements_summary.md
paper2_section3.4_math_improvements.md		paper2_section3.4_math_improvements.md
paper_chinese_removal_summary.md		paper_chinese_removal_summary.md
paper_modifications_summary.md		paper_modifications_summary.md
paper_part1.tex		paper_part1.tex
paper_part1_chinese.tex		paper_part1_chinese.tex
paper_part1_readable.txt		paper_part1_readable.txt
paper_part2.tex		paper_part2.tex
paper_part2_chinese.tex		paper_part2_chinese.tex
paper_part2_readable.txt		paper_part2_readable.txt
pyproject.toml		pyproject.toml
qa_gover_3		qa_gover_3
revision_summary_2026-03-22.md		revision_summary_2026-03-22.md
run_enhanced_test.md		run_enhanced_test.md
test_enhanced_vs_normal.py		test_enhanced_vs_normal.py
uv.lock		uv.lock
venue_recommendations_2026.md		venue_recommendations_2026.md

Folders and files

Latest commit

History

Repository files navigation

MCP-based Knowledge Graph Construction System

Overview

Key Features

Fully Automated Processing

3-Stage Processing Pipeline

Stage 1: Data Quality Assessment

Stage 2: Knowledge Completion (for low-quality data)

Stage 3: Knowledge Graph Construction

MCP Architecture

Requirements

Quick Start

1. Clone and Setup

2. Environment Configuration

3. Start the MCP Server

4. Running Tests

a. Using MCP Inspector

b. Using Client Code

c. Using Mainstream MCP Tools (Cursor, Cherry Studio, etc.)

🛠️ Usage Guide

Interactive Client Commands

Programmatic Usage

Example Outputs

High-Quality Input

Low-Quality Input (Incomplete)

Low-Quality Input (Conflicting)

MCP Tools API

build_knowledge_graph

Project Structure

Core Components

Advanced Features

Quality Assessment Metrics

Knowledge Enhancement Strategies

Visualization Features

Technical Details

Processing Pipeline

Performance Characteristics

Development

Running Tests

Adding New Features

Configuration Options

Contributing

Troubleshooting

Common Issues

License

Acknowledgments

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`build_knowledge_graph`

Packages