🕷️ A comprehensive scraping and analysis suite for Firecrawl (firecrawl.dev)
Target: Mendable's Firecrawl - YC S22 company converting websites to LLM-friendly Markdown
Firecrawl is a rapidly growing API service that converts any website into clean, LLM-friendly Markdown instantly. Positioned as infrastructure for AI agents, they're perfectly riding the LLM wave with their $16/month+ pricing model.
| Aspect | Details |
|---|---|
| Company | Mendable (YC S22) |
| Product | Firecrawl - Website-to-Markdown API |
| Pricing | $16/month+ |
| Positioning | Infrastructure for AI agents |
| Market | LLM developers, AI companies |
- Perfect Timing: Launched right when every AI company needs clean training data
- Infrastructure Play: Not just a tool, but infrastructure developers build on
- Simple API: One endpoint → clean Markdown
- YC Backing: Instant credibility with developer community
pip install playwright
playwright install chromium# Run full scraping suite
python firecrawl_scraper.py
# Run analysis on collected data
python firecrawl_analyzer.pyfirecrawl_scraper.py # Main scraper - extracts all data
firecrawl_analyzer.py # Business analysis and insights
FIRECRAWL_README.md # This file
| File Pattern | Description |
|---|---|
firecrawl_data_*.json |
Raw scraped data |
firecrawl_report_*.md |
Markdown report |
firecrawl_analysis_*.json |
Business analysis |
firecrawl_homepage.png |
Homepage screenshot |
firecrawl_pricing.png |
Pricing page screenshot |
firecrawl_docs.png |
Documentation screenshot |
- Company tagline and description
- Key value propositions
- Feature highlights
- All pricing tiers
- Feature comparisons
- Price points ($16/month+)
- API endpoints
- Usage examples
- Integration patterns
- Target audience signals
- Application scenarios
- Integration patterns
The analyzer provides:
- Pricing tier structure
- Strategy classification (land & expand, good-better-best, etc.)
- Free tier detection
- Enterprise offering detection
- Target audience identification
- Key benefit extraction
- Market category classification
- Infrastructure vs. tool positioning
- Feature categorization (extraction, formatting, AI/ML, etc.)
- Core capability identification
- Feature richness scoring
- Direct competitor identification
- Differentiation factors
- Moat analysis
-
Riding the AI Wave 🌊
- Every LLM company needs clean training data
- Markdown is the lingua franca of LLMs
- Perfect timing with the AI boom
-
Infrastructure Positioning 🏗️
- Higher perceived value than tools
- Stickier product (developers build on it)
- Better unit economics
-
Developer-First 👨💻
- Simple API design
- Clear documentation
- YC credibility
-
Pricing Strategy 💰
- Free tier for acquisition
- Usage-based scaling
- Enterprise tier for big customers
TAM: All companies building AI agents
SAM: LLM developers needing web data
SOM: Developers who prefer APIs over DIY scraping
FirecrawlScraper
├── setup() # Browser initialization
├── scrape_homepage() # Main page extraction
├── scrape_pricing() # Pricing data extraction
├── scrape_docs() # API documentation
├── scrape_use_cases() # Use case extraction
└── generate_report() # Report generationFirecrawlData
├── company_name # "Firecrawl"
├── parent_company # "Mendable"
├── yc_batch # "S22"
├── tagline # Main value prop
├── description # Detailed description
├── pricing[] # Pricing tiers
├── features[] # Product features
├── use_cases[] # Application scenarios
└── api_endpoints[] # API documentation-
Competitive Intelligence
- Track Firecrawl's evolution
- Monitor pricing changes
- Feature comparison
-
Market Research
- Understand positioning
- Analyze go-to-market strategy
- Identify trends
-
Investment Analysis
- Business model validation
- Competitive landscape mapping
- Growth signal detection
-
Product Inspiration
- Feature ideas
- Pricing strategies
- Positioning tactics
# Change browser mode
scraper = FirecrawlScraper(headless=True) # Headless mode
scraper = FirecrawlScraper(headless=False) # Visible browser
# Adjust speed
scraper = FirecrawlScraper(slow_mo=200) # Slower, more human-likeEdit firecrawl_scraper.py and add new extraction methods:
async def scrape_new_section(self):
# Your custom extraction logic
pass{
"company_name": "Firecrawl",
"parent_company": "Mendable",
"yc_batch": "S22",
"tagline": "Turn any website into LLM-ready data",
"pricing": [
{
"name": "Hobby",
"price": "$16/month",
"features": ["5,000 credits", "API access", "..."]
}
],
"features": [...],
"use_cases": [...],
"scraped_at": "2024-01-15T10:30:00"
}# Firecrawl Analysis Report
## Company Overview
- **Company**: Firecrawl
- **Parent**: Mendable
- **YC Batch**: S22
## Pricing Tiers
### Hobby
- **Price**: $16/month
- **Features**: 5,000 credits, API access, ...
## Key Features
- **Website to Markdown**: Converts any URL to clean Markdown
- **API Access**: Simple REST API
...This is a research project for competitive intelligence. Use responsibly and in accordance with Firecrawl's Terms of Service.
MIT License - For educational and research purposes.
💡 Pro Tip: Run the scraper regularly to track how Firecrawl evolves their positioning and pricing as the AI market matures.