A comprehensive, CLI-based URL phishing analyzer built for cybersecurity education.
Made by Monish Paramasivam
The Phishing Detection Tool is a Python-based command-line application designed to analyze URLs for phishing indicators using multiple detection techniques β including URL structure analysis, SSL validation, WHOIS domain intelligence, blacklist matching, and a machine learning classifier.
β οΈ Disclaimer: This tool is built strictly for educational and ethical use only. It is a cybersecurity portfolio project intended to demonstrate phishing detection concepts. Do NOT use it for any unauthorized or illegal activities.
| Feature | Description |
|---|---|
| π URL Structure Analysis | Detects typosquatting, homograph attacks, keyword stuffing, suspicious TLDs, IP addresses, @-symbol abuse, and more |
| π SSL / HTTPS Validation | Checks for HTTPS, validates certificate, detects self-signed or expired certs |
| π WHOIS Domain Intelligence | Retrieves domain age, registrar, registrant info β newly registered domains raise red flags |
| π Blacklist / Whitelist | Custom lists for known phishing domains and trusted sites, persisted as JSON |
| π€ ML Classifier | Random Forest model trained on 15 URL features to predict phishing probability |
| π Risk Scoring | Aggregated 0β100 risk score with Low / Medium / High classification |
| πΎ Report Saving | Save full analysis reports to timestamped .txt files |
- Batch URL analysis β analyze multiple URLs in one session
- Probability bar β visual indicator of phishing likelihood
- Feature breakdown β see which ML features contributed most
- Color-coded terminal UI β intuitive, severity-highlighted output
- Persistent custom lists β blacklist/whitelist saved between sessions
phishing-detection-tool/
β
βββ main.py # CLI entry point, menu system
β
βββ core/ # Core analysis modules
β βββ __init__.py
β βββ analyzer.py # URL structure analysis (12+ checks)
β βββ ssl_checker.py # SSL/HTTPS certificate validation
β βββ whois_lookup.py # WHOIS / domain age lookup
β βββ blacklist.py # Blacklist & whitelist management
β
βββ ml/ # Machine learning module
β βββ __init__.py
β βββ classifier.py # Random Forest phishing classifier
β βββ phishing_model.pkl # Saved trained model (auto-generated)
β βββ scaler.pkl # Feature scaler (auto-generated)
β
βββ utils/ # Utility modules
β βββ __init__.py
β βββ display.py # Colored terminal report renderer
β βββ report.py # Report file generator
β
βββ data/ # Persistent data
β βββ blacklist.json # Custom blacklist (auto-created)
β βββ whitelist.json # Custom whitelist (auto-created)
β
βββ tests/ # Unit tests
β βββ __init__.py
β βββ test_analyzer.py # Test suite (pytest-compatible)
β
βββ reports/ # Saved analysis reports (auto-created)
β
βββ sample_urls.txt # Test URLs for demonstration
βββ requirements.txt # Python dependencies
βββ README.md # This file
- Python 3.8 or higher
- pip (Python package manager)
# 1. Clone the repository
git clone https://github.com/monishparamasivam/phishing-detection-tool.git
cd phishing-detection-tool
# 2. (Recommended) Create a virtual environment
python -m venv venv
source venv/bin/activate # Linux/macOS
venv\Scripts\activate # Windows
# 3. Install dependencies
pip install -r requirements.txt
# 4. Run the tool
python main.pypython main.pyβββββββββββββββββββββββββββββββββββββββββββ
β MAIN MENU β
βββββββββββββββββββββββββββββββββββββββββββ€
β [1] Analyze a Single URL β
β [2] Analyze Multiple URLs (Batch) β
β [3] Manage Blacklist / Whitelist β
β [4] View Sample Test URLs β
β [5] About This Tool β
β [0] Exit β
βββββββββββββββββββββββββββββββββββββββββββ
[?] Enter URL to analyze: http://paypa1-secure-login.com/verify
[1/5] Analyzing URL structure...
[2/5] Checking SSL/HTTPS...
[3/5] Fetching domain information...
[4/5] Checking blacklist/whitelist...
[5/5] Running ML classifier...
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
PHISHING ANALYSIS REPORT
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
[TARGET URL]
http://paypa1-secure-login.com/verify
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β π¨ RISK LEVEL: HIGH | SCORE: 87/100 β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
[SCORE BREAKDOWN]
β’ URL structure issues: +35 pts
β’ No HTTPS: +15 pts
β’ ML classifier risk: +22 pts
β’ Domain is only 12 days old: +10 pts
...
# Using pytest
pytest tests/ -v
# Or directly
python tests/test_analyzer.pyURL Input
β
ββββΆ [1] URL Analyzer β 12+ structural checks
β β’ Length, IP, subdomains, keywords, homographs,
β special chars, TLD, brand impersonation...
β
ββββΆ [2] SSL Checker β HTTPS & certificate validation
β β’ Protocol check, cert validity, expiry, issuer,
β self-signed detection...
β
ββββΆ [3] WHOIS Lookup β Domain intelligence
β β’ Registration date, domain age, registrar,
β privacy protection...
β
ββββΆ [4] Blacklist Check β List-based matching
β β’ Custom blacklist, custom whitelist,
β domain normalization...
β
ββββΆ [5] ML Classifier β Random Forest prediction
β β’ 15 engineered features, probability score,
β confidence level...
β
ββββΆ Final Aggregation β Risk Score (0-100) β LOW / MEDIUM / HIGH
| Score Range | Risk Level | Action |
|---|---|---|
| 0 β 30 | β LOW | Generally safe |
| 31 β 65 | Exercise caution | |
| 66 β 100 | π¨ HIGH | Likely phishing |
The classifier uses 15 engineered features:
- URL total length
- IP address in URL
- Subdomain depth count
- Suspicious keyword count
- HTTPS presence
- @ symbol presence
- Hyphen count in domain
- Dot count in domain
- URL shortener presence
- Suspicious TLD usage
- Digit count in domain
- URL path depth
- Brand name spoofing indicator
- Special character count
- Domain Shannon entropy
| Package | Purpose |
|---|---|
requests |
HTTP connection for SSL checks |
python-whois |
WHOIS domain data retrieval |
scikit-learn |
Random Forest ML classifier |
numpy |
Numerical feature processing |
joblib |
Model serialization |
colorama |
Cross-platform colored CLI output |
pytest |
Unit test framework |
| URL | Expected Result |
|---|---|
https://google.com |
β LOW risk |
https://github.com |
β LOW risk |
http://paypa1-secure-login.com |
π¨ HIGH risk |
http://192.168.1.1/bank/login |
π¨ HIGH risk |
https://microsoft.com.login.evil.ru |
π¨ HIGH risk |
http://bit.ly/free-offer |
See sample_urls.txt for the full test list.
This tool was developed for:
- Learning phishing detection techniques
- Cybersecurity portfolio demonstration
- Understanding URL analysis and threat intelligence
- Educational research into social engineering defenses
This tool must NOT be used to:
- Target real individuals or organizations
- Conduct unauthorized security testing
- Facilitate any form of cybercrime
Contributions are welcome for educational improvements:
- Fork the repo
- Create a feature branch (
git checkout -b feature/add-virustotal-api) - Commit changes (
git commit -m 'Add VirusTotal API integration') - Push to branch (
git push origin feature/add-virustotal-api) - Open a Pull Request
- VirusTotal API integration
- Google Safe Browsing API check
- Web scraping for page content analysis
- Browser extension version
- REST API wrapper (Flask/FastAPI)
- GUI interface (Tkinter or PyQt)
- Train on larger datasets (PhishTank, UCI)
Monish Paramasivam
- Cybersecurity Enthusiast & Python Developer
- Portfolio Project β Phishing Detection Tool v1.0
"Security is not a product, but a process." β Bruce Schneier
β Star this repo if it helped your learning journey!