This repository contains scripts for integrating the UIED (UI Element Detection) framework with EasyOCR for robust text and component recognition from screenshots. The primary goal is to provide a toolkit for analyzing UI structures from images.
The core technology combines computer vision techniques to identify graphical UI elements and an OCR engine to extract textual information.
The classification models are not stored in this repository. You need to download them manually.
- Download Link: Google Drive
- Installation: Place the downloaded
.h5
files into themodels/
directory at the root of this project.
This repository includes three interactive example scripts that demonstrate different capabilities of the integration. To use any of them, run the script and press F3 to capture the screen and initiate the analysis. Press Esc to exit.
This script showcases the complete, integrated UIED pipeline. It performs the following steps:
- Captures a screenshot.
- Detects UI components using the
detect_components
function. - Recognizes text using EasyOCR via the
text_detection
function. - Classifies the detected non-text components (e.g., buttons, icons) using a CNN classifier.
- Merges the results from component detection, classification, and OCR.
- Saves the final annotated image, showing bounding boxes around all detected UI elements and text.
This is the main example for getting a comprehensive analysis of a user interface.
This script demonstrates how to isolate graphical, non-text UI elements. It's useful when you only care about the layout of interactive elements and want to ignore text.
- Captures a screenshot.
- Detects all potential UI components.
- Detects all text blocks using OCR.
- Filters out components that have a high degree of overlap (90% or more) with the detected text blocks.
- Saves an annotated image showing bounding boxes only around the filtered, non-text components.
This script builds upon example1.py
to further refine the component detection, aiming to isolate small, icon-like elements.
- Performs all steps from
example1.py
to get a set of non-text components. - Applies a second filter to this set based on element size and aspect ratio. This filter is designed to keep elements that are likely to be icons or small buttons.
- Saves an annotated image showing bounding boxes only around the final filtered components that match the size and shape criteria.
The example.py
script uses a TestClassifier
class (defined in classifier.py
) to categorize the detected UI components.
- Model: It loads a pre-trained Keras model (
cnn-generalized.h5
). This model is a convolutional neural network (CNN). - Input: It takes cropped images of detected components, resizes them to 64x64 pixels, and preprocesses them.
- Output: The model predicts one of the 13 trained classes for each component:
Button
CheckBox
Chronometer
EditText
Image
ImageButton
NumberPicker
RadioButton
RatingBar
SeekBar
Spinner
Switch
TextView
As the name TestClassifier
suggests, this is a proof-of-concept implementation. The accuracy may vary, but it serves as a good starting point for integrating more advanced classification models.