Turn a raw patent spreadsheet into structured, analysis-ready intelligence β automatically.
Quick Start Β· Features Β· AI Summaries Β· Output Β· Configuration Β· Contributing
Patent teams, tech transfer offices, and IP researchers deal with the same pain:
You have a spreadsheet of patent applications. You need current status, dates, assignees, and family relationships β and it's all scattered across Google Patents, WIPO, and EPO. Updating it manually takes hours and goes stale immediately. And even when you have the data, you still need to explain each invention in plain English for licensing conversations.
This tool automates all of it.
Give it a spreadsheet of patent application numbers. It returns a clean, formatted Excel workbook with everything β fetched automatically from public databases, with AI-generated summaries for every invention.
Input: patents.xlsx (your raw list of application numbers)
β
patent_lookup.py (~5 seconds per patent, runs unattended)
β
Output: patent_results.xlsx β statuses, dates, links, family summaries
patent_summary.txt β portfolio breakdown + AI invention summaries
Most patent tools stop at the data. This one doesn't.
For every patent family in your portfolio, the tool reads the abstract, claims, and description β then writes you two summaries you can actually use:
A technical summary β precise enough for IP counsel, patent agents, and licensing due diligence. One sentence capturing the core claim scope in language an IP-literate reader expects.
A plain-English summary β written for business development, executives, and industry partners who need to understand the invention without a law degree.
For example:
Technical: A portable toilet transfer device featuring a clip mechanism with interior
and exterior engagement members that secures to a toilet bowl rim and
connects to a weight-bearing handle for wheelchair-to-toilet transfers.
Plain: A removable handle that clamps onto a toilet bowl to help wheelchair users
safely transfer onto and off of the toilet seat.
No prompting. No copy-pasting. No reading claims. Just run the tool.
These summaries show up in the Family Summary sheet and in a dedicated section of the text report β ready to drop into a licensing one-pager, an annual report, or a board presentation.
What it handles automatically:
- Fetches full patent text from Google Patents where available
- Falls back to EPO's API for PCT families with no GP text
- Handles design patents correctly β explains they protect appearance, not function
- Skips families with no published text rather than making something up
Who uses these summaries:
- Tech transfer managers briefing leadership on the portfolio
- Licensing teams preparing outreach to industry partners
- Researchers documenting their innovation outputs
- Anyone who has ever spent 20 minutes reading a patent just to explain it in one sentence
Cost: roughly $0.02 per full run on a 20β30 patent portfolio. A $5 credit covers hundreds of runs.
| Feature | Description |
|---|---|
| π€ Plain-English summaries | Automatically generates technical + plain-English invention descriptions β no prompting required |
| π Google Patents scraping | Direct URL, XHR endpoint, JSON-LD β multiple fallback strategies |
| π PCT / WO lookup | EPO OPS API resolves PCT serials to WO publication numbers + abstracts |
| π¨ Design patent support | Playwright headless browser types 29/xxxxxx into GP search box |
| π¨βπ©βπ§ Patent family grouping | Aggregates all related filings under a shared family ID |
| Detects title/inventor mismatches, wrong patents, provisional-only families | |
| π Color-coded Excel output | Two sheets β all applications + family summary β with conditional formatting |
| β‘ Smart caching | Results saved after each scrape; re-runs skip already-fetched records |
| π Zero mandatory API keys | Works out of the box; EPO + Anthropic keys unlock additional features |
- Tech transfer offices β portfolio tracking, licensing diligence, annual reporting, and ready-to-use invention descriptions
- IP counsel & patent agents β quickly audit and summarize a client's filing portfolio without reading every claim
- Research institutions β map innovation outputs to public patent records with plain-English context for non-technical stakeholders
- Data scientists & developers β structured patent data + auto-generated descriptions as a foundation for analysis
Option A β pip (recommended)
git clone https://github.com/elichter/patent-lookup-tool
cd patent-lookup-tool
pip install -r requirements.txt
playwright install chromium # only needed for design patents (29/xxx serials)Option B β conda
git clone https://github.com/elichter/patent-lookup-tool
cd patent-lookup-tool
conda create -n patent-lookup python=3.8
conda activate patent-lookup
conda install -c conda-forge requests beautifulsoup4 lxml pandas openpyxl python-dotenv
pip install playwright # playwright not on conda-forge; install via pip
playwright install chromiumOption C β manual (if you prefer to control your environment)
Core requirements (all pip-installable):
requests>=2.31.0
beautifulsoup4>=4.12.0
lxml>=5.0.0
pandas>=2.0.0
openpyxl>=3.1.0
python-dotenv>=1.0.0
Optional (only needed for design patents):
playwright>=1.40.0 β pip install playwright && playwright install chromium
Don't want to install manually? Just run
python patent_lookup.pyβ it detects missing packages and asks if you'd like to install them automatically.
Drop your patent list as patents.xlsx in the project folder.
The tool auto-detects column names β no configuration required for standard formats.
See Input Format for details.
cp .env.example .env
# Edit .env with your EPO and/or Anthropic keys| Key | Where to get it | What it unlocks |
|---|---|---|
EPO_KEY + EPO_SECRET |
developers.epo.org β free | PCT β WO lookup, EPO abstract fetching |
ANTHROPIC_API_KEY |
console.anthropic.com β pay-as-you-go | AI invention summaries (~$0.02/run) |
The tool works without any API keys β but the AI summaries are the most valuable output. A $5 credit will last hundreds of runs.
python patent_lookup.pyOutput files appear in the same folder: patent_results.xlsx and patent_summary.txt.
The tool auto-detects column names using fuzzy matching β Application Number, App No, and Serial Number all work automatically.
One row per patent application, with a header row:
| Column | Example | Notes |
|---|---|---|
| Family ID | SMITH_JON.001 |
Groups related filings together |
| Title | Widget for reducing friction |
Invention title |
| Serial Number | 17/072,674 or PCT/US23/75015 |
US, design, or PCT format |
| Patent Number | 12,121,656 or D874,011 |
Leave blank if pending |
| Publication Number | WO 2023/064756 |
Optional |
| File Date | 2020-10-16 |
Any standard date format |
| Status | Pending, Issued, Expired |
Your internal status |
| Inventors | Smith, John, Doe, Jane |
Comma-separated |
| Country | United States |
Filing country |
Column name aliases are fully configurable in config.py.
One row per invention (not per filing). Configure column positions in config.py.
Sheet 1 β All Applications
One row per patent application β your internal data alongside GP-fetched fields:
- GP Title, Status, Filing / Priority / Issue / Expiration dates
- GP Assignee and Inventors, WO Publication Number
- Hyperlinked "View Patent" link
- Search method used + not-found reason
- Data quality flags (mismatch warnings)
Color-coded by patent family, with green highlighting for populated date fields.
Sheet 2 β Family Summary
One row per patent family:
- Invention title, all filings listed, overall status
- Earliest priority date, latest expiration date
- Has PCT? + PCT Publication Number
- π€ AI Invention Summary β technical + plain-English description, one sentence each
- Data gap flag:
β YES/β partial/ clean - Breakdown:
β Found: title A | β Not found: title B - Mismatch warning if a serial number resolves to the wrong patent
Text report including:
- Lookup statistics (found / not found / provisionals / requests made)
- Portfolio status breakdown (internal data)
- Patent family breakdown (granted / pending / PCT / design counts)
- Per-family filing detail with links
- π€ INVENTION SUMMARIES β standalone section with Tech ID, patent number, and full AI-generated technical + plain-English description for every family with available patent text
For each filing the tool tries strategies in order, stopping at first success:
63/xxx, 62/xxx β Flagged as provisional (legally unpublishable by US law)
29/xxx β Playwright types serial into GP search β follows JS redirect to USD######S1
PCT/USxx/xxxxx β EPO OPS API (WO number + abstract) β GP search β WIPO Patentscope
US + patent# β Direct GP URL (USD######S1 or US########B2)
US pending β Direct GP URL β title + inventor keyword search fallback
Mismatch detection: After all lookups complete, the tool compares internal titles and inventors against GP results using stemmed word overlap. If both title overlap (< 20%) and inventor overlap fail, the row is flagged, GP data is cleared, and a warning is written explaining the likely data entry error.
All user settings live in config.py:
INPUT_FILE = "patents.xlsx" # your input file
ORG_NAME = "My Organization" # appears in the summary report
DELAY_MIN = 3.0 # seconds between requests (be polite to Google)
DELAY_MAX = 6.0
# Column name aliases β extend these if auto-detection misses your column names
COL_ALIASES = {
"family_id": ["Tech ID", "Family ID", "Case ID", "Docket"],
"serial": ["Serial Number", "Application Number", "App No"],
# ... see config.py for full list
}patent-lookup-tool/
βββ patent_lookup.py # Main script
βββ config.py # All user configuration
βββ load_keys.py # Helper: reads a central API keys file β writes .env
βββ requirements.txt # Python dependencies
βββ sample_patents.xlsx # Anonymized sample input (4 families, all patent types)
βββ .env.example # API key template
βββ .gitignore
βββ README.md
Results are saved to patent_cache.json after each scrape. Re-runs skip cached records instantly β a 24-record portfolio runs in ~2 minutes on first run and under 5 seconds on re-run.
# Remove a specific entry to force re-scrape
python3 -c "
import json; c = json.load(open('patent_cache.json'))
c.pop('17/072,674', None)
json.dump(c, open('patent_cache.json','w'), indent=2)
"
# Clear all cache
rm patent_cache.json| Package | Purpose |
|---|---|
requests + beautifulsoup4 + lxml |
HTTP requests and HTML parsing |
playwright |
Headless Chromium for JS-rendered pages |
pandas + openpyxl |
Excel I/O and formatting |
python-dotenv |
.env file loading |
| EPO OPS API (free, key required) | PCT β WO lookup + abstract fetching |
| Anthropic API (pay-as-you-go, optional) | AI invention summaries via Claude |
- Rate limiting β 3β6 second delays between requests by default. Don't reduce below 2s or Google will throttle your IP.
- US pending apps β applications filed less than 18 months ago are not yet published and won't appear on GP. This is expected and flagged in the output.
- Design patents β require Playwright. Without it, design patent rows fall back to a GP search link only.
- PCT/WO lookup β requires free EPO OPS credentials. Without them, PCT entries get a WIPO search link.
- AI summaries β require an Anthropic API key. Without one, the Invention Summary column is left blank. A $5 credit covers hundreds of runs.
- Terms of service β this tool scrapes public patent data. Use responsibly and in accordance with Google Patents' ToS.
Contributions are welcome. Please fork the repo and submit a pull request.
This project uses git-cliff for changelog generation. When committing, use conventional commit prefixes for best results:
| Prefix | Changelog section |
|---|---|
feat: or add: |
Added |
fix: or bug: |
Fixed |
update: or change: |
Changed |
remove: |
Removed |
doc: or readme: |
Documentation |
chore: or ci: |
(skipped) |
See generate_changelog.sh for the full release workflow.
Ideas for improvement:
- Support for EP, JP, CN national phase scraping
- USPTO Patent Center API for richer US-specific metadata
- Streamlit or FastAPI web frontend
- Automated tests with a sample portfolio fixture
- GitHub Actions CI for lint + syntax checks
MIT β free to use, modify, and distribute. See LICENSE.
Run it once. Walk away with summaries you can actually use.