A fully automated knowledge graph construction system built on the Model Context Protocol (MCP), implementing a sophisticated 3-stage data processing pipeline for intelligent knowledge extraction and graph generation.
This project implements an advanced knowledge graph construction system that automatically processes raw text data through three intelligent stages:
- Data Quality Assessment - Evaluates completeness, consistency, and relevance
- Knowledge Completion - Enhances low-quality data using LLM and external knowledge bases
- Knowledge Graph Construction - Builds structured knowledge graphs with confidence scoring
The system is built on the MCP (Model Context Protocol) architecture, providing a clean client-server interface for seamless integration and scalability.
- Zero Manual Intervention: Automatically detects data quality and processing needs
- Intelligent Pipeline: Adapts processing strategy based on input data characteristics
- Real-time Processing: Immediate knowledge graph generation from raw text
- Completeness Analysis: Evaluates entity and relationship coverage
- Consistency Checking: Detects semantic conflicts and contradictions
- Relevance Scoring: Assesses information relevance and meaningfulness
- Quality Threshold: Automatically determines if data needs enhancement
- Entity Enhancement: Completes missing entity information
- Relationship Inference: Adds missing relationships between entities
- Conflict Resolution: Corrects semantic inconsistencies
- Format Normalization: Standardizes data format and structure
- Implicit Knowledge Inference: Extracts hidden knowledge patterns
- Rule-based Extraction: Fast, deterministic triple generation
- LLM-enhanced Processing: Advanced semantic understanding and relationship inference
- Confidence Scoring: Assigns reliability scores to extracted knowledge
- Interactive Visualization: Generates beautiful HTML visualizations
- Client-Server Design: Clean separation of concerns
- Standardized Protocol: Built on MCP for interoperability
- Tool-based Interface: Modular, extensible tool system
- Async Processing: High-performance asynchronous operations
- Python: 3.11 or higher
- UV Package Manager: For dependency management
- OpenAI-compatible API: For LLM integration (DeepSeek, OpenAI, etc.)
git clone https://github.com/turambar928/MCP_based_KG_construction.git
cd MCP_based_KG_construction
# Install dependencies
uv syncCreate a .env file with your API configuration:
OPENAI_API_KEY=your_api_key_here
OPENAI_BASE_URL=https://api.siliconflow.cn/v1 # or your preferred endpoint
OPENAI_MODEL=Qwen/QwQ-32B # or your preferred modelSupported API Providers:
- OpenAI:
https://api.openai.com/v1 - DeepSeek:
https://api.deepseek.com - SiliconFlow:
https://api.siliconflow.cn/v1 - Any OpenAI-compatible endpoint
uv run kg_server.pyThe server will start and listen for MCP client connections.
There are three ways to test the system:
npx -y @modelcontextprotocol/inspector uv run kg_server.pyAfter running this command, click the link that appears after "MCP Inspector is up and running at" to open the MCP Inspector in your browser. Once opened:
- Click "Connect"
- Select "Tools" from the top menu
- Choose "build_knowledge_graph" from the list tools
- Enter your text in the left panel to generate the knowledge graph
uv run kg_client.pyAfter the connection is successful, enter your text to view the results.
Example: Running in Cherry Studio
In settings, select MCP servers, click "Add Server" (import from JSON). Here's the configuration JSON (make sure to modify the local path):
{
"mcpServers": {
"kg_server": {
"command": "uv",
"args": [
"--directory",
"D:/mcp_getting_started",
"run",
"kg_server.py"
],
"env": {},
"disabled": false,
"autoApprove": []
}
}
}After enabling this MCP server, you can use it in Cherry Studio.
Once the client is running, you can use these commands:
# Build knowledge graph from text
build <your_text_here>
# Example usage
build 北京大学是中国著名的高等教育机构,位于北京市海淀区
# Run demonstration examples
demo
# Exit the client
quitfrom kg_client import KnowledgeGraphClient
async def main():
client = KnowledgeGraphClient()
await client.connect_to_server()
# Build knowledge graph
result = await client.build_knowledge_graph(
"苹果公司的CEO是蒂姆·库克",
output_file="my_graph.html"
)
print(f"Generated graph: {result}")
await client.cleanup()Input: "北京大学是中国著名的高等教育机构,位于北京市海淀区。"
Processing: Direct Stage 3 (high quality detected)
Output:
- Entities: [北京大学, 中国, 高等教育机构, 北京市, 海淀区]
- Triples: [(北京大学, 是, 高等教育机构), (北京大学, 位于, 海淀区), ...]
- Visualization: Interactive HTML graph
Input: "李华去巴黎"
Processing:
- Stage 1: Detects incomplete information
- Stage 2: Enhances with "巴黎位于法国", "李华是人"
- Stage 3: Builds enhanced knowledge graph
Output: Enriched knowledge graph with inferred relationships
Input: "巴黎市是德国城市。"
Processing:
- Stage 1: Detects semantic conflict
- Stage 2: Corrects to "巴黎是法国城市"
- Stage 3: Builds corrected knowledge graph
Output: Corrected and enhanced knowledge graph
The system exposes the following MCP tools for integration:
Description: Complete pipeline for knowledge graph construction with automatic quality assessment and enhancement.
Parameters:
text(string): Input text to processoutput_file(string, optional): HTML visualization output filename (default: "knowledge_graph.html")
Returns: JSON object containing:
success(boolean): Processing success statusentities(array): Extracted entitiestriples(array): Generated knowledge triplesconfidence_scores(array): Confidence scores for each triplevisualization_file(string): Path to generated HTML visualizationprocessing_stages(object): Details of each processing stage
Example:
{
"success": true,
"entities": ["北京大学", "中国", "高等教育机构"],
"triples": [
{
"subject": "北京大学",
"predicate": "是",
"object": "高等教育机构",
"confidence": 0.95
}
],
"visualization_file": "knowledge_graph.html"
}├── kg_server.py # Main MCP server implementation
├── kg_client.py # Interactive client for testing
├── kg_utils.py # Core knowledge graph construction utilities
├── kg_visualizer.py # HTML visualization generator
├── data_quality.py # Stage 1: Data quality assessment
├── knowledge_completion.py # Stage 2: Knowledge completion and enhancement
├── pyproject.toml # Project dependencies and configuration
├── .env # Environment variables (API keys)
└── README.md # This file
kg_server.py: MCP server that orchestrates the 3-stage pipelinekg_client.py: Command-line client for interactive testing and batch processingkg_utils.py: Knowledge graph construction engine with rule-based and LLM-enhanced extractionkg_visualizer.py: Generates interactive HTML visualizations using Plotlydata_quality.py: Implements quality assessment algorithms for completeness, consistency, and relevanceknowledge_completion.py: Handles knowledge enhancement and conflict resolution
- Completeness Score: Based on entity coverage and relationship density
- Consistency Score: Detects semantic conflicts and contradictions
- Relevance Score: Evaluates information meaningfulness
- Composite Quality Score: Weighted combination of all metrics
- Entity Completion: Adds missing entity attributes and types
- Relationship Inference: Discovers implicit relationships
- Conflict Resolution: Corrects factual inconsistencies
- Format Normalization: Standardizes entity and relationship representations
- Interactive Network Graph: Clickable nodes and edges
- Entity Clustering: Groups related entities by type
- Confidence Visualization: Color-coded confidence levels
- Export Options: HTML, PNG, SVG formats
- Input Validation: Checks text format and encoding
- Quality Assessment: Multi-dimensional quality scoring
- Conditional Enhancement: Applies enhancement only when needed
- Graph Construction: Rule-based + LLM hybrid approach
- Confidence Calculation: Bayesian confidence scoring
- Visualization Generation: Interactive HTML output
- Processing Speed: ~1-3 seconds per text input
- Memory Usage: ~50-100MB for typical workloads
- Scalability: Async architecture supports concurrent processing
- Accuracy: 85-95% entity extraction, 80-90% relationship accuracy
Refer to the "Running Tests" section above for three different testing methods:
- MCP Inspector (recommended for visual testing)
- Client code (for programmatic testing)
- Mainstream MCP tools (for integration testing)
# Quick test with demonstration examples
uv run kg_client.py
# Then type: demo
# Test with custom input
uv run kg_client.py "Your test text here"- Custom Quality Metrics: Extend
data_quality.py - New Enhancement Strategies: Modify
knowledge_completion.py - Additional Visualization: Enhance
kg_visualizer.py - New MCP Tools: Add tools to
kg_server.py
Environment variables in .env:
# Required
OPENAI_API_KEY=your_api_key
OPENAI_BASE_URL=your_api_endpoint
OPENAI_MODEL=your_model_name
# Optional
QUALITY_THRESHOLD=0.5 # Quality threshold for enhancement
MAX_ENTITIES=50 # Maximum entities per graph
VISUALIZATION_WIDTH=1200 # HTML visualization width
VISUALIZATION_HEIGHT=800 # HTML visualization height- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Make your changes and test thoroughly
- Submit a pull request with detailed description
-
Port Occupation Error
# Find process using the port netstat -ano | findstr :6277 # Kill the process taskkill /PID <process_id> /F
-
API Balance Insufficient
- Check API configuration in
.envfile - Ensure API account has sufficient balance
- Check API configuration in
-
Dependency Installation Issues
uv sync --reinstall
This project is licensed under the MIT License - see the LICENSE file for details.
- Built on the Model Context Protocol (MCP)
- Visualization powered by Plotly
- Graph algorithms using NetworkX
- LLM integration via OpenAI API
For questions, issues, or contributions:
- 📧 Email: tzf9282003@163.com
- 🐛 Issues: GitHub Issues
- 📖 Documentation: See
KNOWLEDGE_GRAPH_README.mdfor detailed technical documentation


