A high-performance, stateless spam detection service running on Cloudflare Workers. Inspired by Rspamd, but designed to run entirely at the edge without databases or external dependencies.
Current Performance (Enron Spam Dataset):
- Spam Detection Rate: 70.92%
- False Positive Rate: 2.67%
Note: The project history and logs may not reflect the most recent algorithmic adjustments and tuning represented in these statistics.
- Multi-layer Analysis: 6 independent analyzers work together for accurate detection
- Stateless Design: No database required - runs entirely in a Cloudflare Worker
- Fast: Typical analysis completes in <4ms
- Accurate: Tuned for low false positives suitable for business environments
- Configurable: Adjustable thresholds and debug options
- Batch Support: Analyze up to 100 emails in one request
- Header Analyzer - SPF/DKIM/DMARC validation, header anomalies, authentication failures
- Content Analyzer - Spam phrases, word patterns, text statistics
- URL Analyzer - Suspicious domains, URL shorteners, phishing patterns
- HTML Analyzer - Hidden text, tracking pixels, malicious elements
- Pattern Analyzer - Obfuscation detection, encoding tricks, structural patterns
- Bayesian Classifier - Statistical token analysis using pre-trained probabilities
bun install# Run development server
bun run dev
# Run tests
bun test
# Run tests with coverage
bun run test:coverage
# Type check
bun run typecheckbun run deployReturns API information and available endpoints.
Returns service health status and timestamp.
Returns the current default configuration values.
Performs full spam analysis and returns detailed results including individual analyzer scores and matched rules.
Request Body Parameters:
| Parameter | Type | Description |
|---|---|---|
from |
string | Sender email address |
to |
string/array | Recipient email address(es) |
subject |
string | Email subject line |
textBody (or text_body, text, body) |
string | Plain text body content |
htmlBody (or html_body, html) |
string | HTML body content |
raw |
string | Raw MIME email string |
headers |
object | Key-value pairs of email headers |
receivedSpf (or received_spf) |
string | value of Received-SPF header |
dkimSignature (or dkim_signature) |
string | value of DKIM-Signature header |
authenticationResults (or authentication_results) |
string | value of Authentication-Results header |
clientIp (or client_ip) |
string | IP address of the sender |
helo |
string | HELO/EHLO string from SMTP session |
debug |
boolean | Set to true to include detailed debug info |
config |
object | Override default configuration (see Configuration section) |
Example Request:
{
"from": "sender@example.com",
"to": "recipient@example.com",
"subject": "Email Subject",
"textBody": "Plain text content",
"headers": {
"received-spf": "pass"
},
"debug": true
}Example Response:
{
"score": 0.2,
"threshold": 3.5,
"confidence": 0.37,
"classification": "ham",
"analyzers": [...],
"topReasons": ["..."],
"processingTimeMs": 1,
"debug": {...}
}Quick spam check returning a simple boolean verdict. Uses the same input parameters as /analyze.
Response:
{
"isSpam": false
}Returns only the calculated score, threshold, and classification. Uses the same input parameters as /analyze.
Response:
{
"score": 2.5,
"threshold": 3.5,
"classification": "probable_ham"
}Analyze multiple emails in a single request (maximum 100).
Request:
{
"emails": [
{ "subject": "Email 1", "textBody": "..." },
{ "subject": "Email 2", "textBody": "..." }
],
"config": {
"spamThreshold": 4.0
}
}Response:
{
"summary": {
"total": 2,
"spam": 1,
"ham": 1,
"errors": 0
},
"results": [...]
}Analyze a raw MIME email string.
Option 1: JSON
{
"raw": "From: sender@example.com\r\nSubject: Test\r\n\r\nBody..."
}Option 2: Text/Plain
Send the raw email content directly as the request body with Content-Type: text/plain or message/rfc822.
These values can be passed in the config object to override defaults.
| Option | Default | Description |
|---|---|---|
spamThreshold |
3.5 | Score above which email is marked as spam |
probableSpamThreshold |
2.0 | Score for "probable spam" classification |
enableDebug |
false | Include debug information in response |
- ham - Score ≤ 1.0, definitely legitimate
- probable_ham - Score 1.0-2.0, likely legitimate
- probable_spam - Score 2.0-3.5, likely spam
- spam - Score ≥ 3.5, definitely spam
# Quick check
curl -X POST https://your-worker.workers.dev/check \
-H "Content-Type: application/json" \
-d '{"subject": "Test", "textBody": "Hello world"}'
# Full analysis
curl -X POST https://your-worker.workers.dev/analyze \
-H "Content-Type: application/json" \
-d '{
"from": "sender@example.com",
"to": "recipient@example.com",
"subject": "Important Message",
"textBody": "This is the email content",
"debug": true
}'const response = await fetch("https://your-worker.workers.dev/analyze", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
from: "sender@example.com",
subject: "Test Email",
textBody: "Email content here",
}),
});
const result = await response.json();
console.log(`Spam: ${result.isSpam}, Score: ${result.score}`);spamguard/
├── src/
│ ├── index.ts # Hono API entry point
│ ├── engine.ts # Main SpamGuard engine
│ ├── types.ts # TypeScript types
│ ├── analyzers/
│ │ ├── header.ts # Header analysis
│ │ ├── content.ts # Content analysis
│ │ ├── url.ts # URL analysis
│ │ ├── html.ts # HTML analysis
│ │ ├── pattern.ts # Pattern detection
│ │ └── bayesian.ts # Bayesian classifier
│ ├── data/
│ │ ├── spam-words.ts # Spam word database
│ │ ├── url-blacklist.ts # URL/domain data
│ │ ├── patterns.ts # Detection patterns
│ │ └── bayesian-tokens.ts # Token probabilities
│ ├── parser/
│ │ └── email.ts # Email parser
│ └── utils/
│ └── text.ts # Text utilities
├── tests/
│ ├── fixtures/
│ │ └── emails.ts # Test email samples
│ ├── unit/
│ │ ├── parser.test.ts
│ │ ├── text.test.ts
│ │ └── analyzers.test.ts
│ └── integration/
│ ├── engine.test.ts
│ └── api.test.ts
├── package.json
├── tsconfig.json
└── wrangler.toml
MIT