Automatically extracts data points from graph images using computer vision, OCR, and signal processing. The extracted data can be analyzed and exported to CSV for further use.
- Detect X and Y axes from graphs
- Extract axis values using OCR (Tesseract)
- Detect grid lines and map pixel positions to actual values
- Digitize graph data points
- Highlight deviation points (local maxima/minima)
- Compute statistics (mean, min, max, RMS, skewness, kurtosis)
- Export data to CSV for analysis
- Clone the repository:
git clone https://github.com/meekhumor/Graph-Extractor.git cd Graph-Extractor - Install dependencies:
pip install opencv-python numpy pytesseract scipy pandas
- Make sure Tesseract OCR is installed and added to your PATH.
- Place your graph image in the
graph/directory (e.g.,graph/your_graph.png). The script expects a simple line graph with grid lines for best results. - Run the main script:
python extractor.py
- By default, it processes
graph/graph3.png. - To use a different image, modify the line:
preprocess_image('graph/your_image.png')
- OpenCV windows will display visualizations of axes detection, grid lines, data points, and deviations.
- Extracted data is saved as
csv/graph3.csv(columns: X, Y). - Statistics (mean, min, max, RMS, peak-to-valley, skewness, kurtosis) are printed to the console.
- By default, it processes
- Press any key in the OpenCV windows to close them.
Console Stats:
Mean: 5.2341
Min: 1.0000
Max: 10.0000
RMS: 5.6789
Peak-to-Valley: 9.0000
Skewness: 0.1234
Kurtosis: -0.5678
CSV File (csv/graph3.csv):
X,Y
0.0,4.5
1.0,5.2
2.0,3.8
...
- Preprocessing: Loads the image, converts to grayscale, and applies thresholding for edge detection.
- Axis Detection: Uses Hough Line Transform to identify horizontal (X-axis) and vertical (Y-axis) lines.
- Value Extraction: Crops text regions from axes and uses Tesseract OCR to parse numerical values.
- Grid Detection: Scans for vertical/horizontal lines, clusters them, and maps to interpolated axis values.
- Digitization: Finds non-zero pixels (data points), interpolates pixel coordinates to real values.
- Analysis: Detects local extrema, computes statistics, and exports to CSV.
meekhumor-graph-extractor/
├── README.md
├── extractor.py
├── graph/ # Input graph images (e.g., graph3.png)
└── csv/ # Output CSV files (auto-created)
- Best for simple 2D line plots with visible grid lines.
- OCR accuracy depends on text clarity; may fail on handwritten or stylized labels.
- No support for logarithmic scales or multi-line graphs (extendable).
- Requires manual image path updates; consider adding command-line arguments for production use.
Contributions are welcome! Fork the repo, make changes, and submit a pull request. For major changes, please open an issue first.


