feat(openalex): add OpenAlex API integration by hongkongkiwi · Pull Request #30 · openags/paper-search-mcp

hongkongkiwi · 2026-01-22T08:14:58Z

OpenAlex integration with 200M+ works, citations, author search. Adds search_openalex, get_openalex_paper, get_openalex_citations, get_openalex_references, search_openalex_by_author, get_openalex_related, download_openalex, read_openalex_paper tools.

…ic paper search OpenAlex is a free and open catalog of the global research system with over 200M works. This integration provides: - Full paper search with advanced filters (year, type, full-text availability) - Citation and reference traversal (forward and backward citations) - Author-based search - Related papers discovery based on concepts and references - DOI and OpenAlex ID lookup - Comprehensive metadata including concepts, keywords, and open access info Features: - search_openalex: Main search function with filtering options - get_openalex_paper: Get paper by OpenAlex ID - get_openalex_paper_by_doi: Get paper by DOI - get_openalex_citations: Get papers that cite this work - get_openalex_references: Get papers referenced by this work - search_openalex_by_author: Search papers by author name - get_openalex_related: Find related papers - download_openalex: Download PDF from open access sources - read_openalex_paper: Extract text from PDF

Copilot

Pull request overview

Adds OpenAlex as an additional academic source, exposing MCP tools for searching works and traversing citation/references/related-work graphs, plus optional PDF download/read support via open-access locations.

Changes:

Introduces OpenAlexSearcher implementation backed by the OpenAlex REST API.
Wires new OpenAlex MCP tools into paper_search_mcp/server.py (search, lookup, citations, references, author search, related, download, read).
Adds OpenAlex-specific parsing for works into the project’s standardized Paper format.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 15 comments.

File	Description
`paper_search_mcp/server.py`	Registers OpenAlex searcher instance and exposes OpenAlex MCP tools.
`paper_search_mcp/academic_platforms/openalex.py`	Implements OpenAlex API calls, result parsing, and PDF download/read helpers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-23T10:04:29Z

paper_search_mcp/academic_platforms/openalex.py

+                # Year range
+                params["filter"] = f"from_publication_date:{year}"


Year-range handling sets filter = from_publication_date:{year} when year is in the YYYY-YYYY form (per docstring/examples). from_publication_date is a date field, so this produces an invalid/ambiguous filter. Convert ranges to proper date filters (e.g., from_publication_date:YYYY-01-01,to_publication_date:YYYY-12-31) or use publication_year constraints.

Suggested change

# Year range

params["filter"] = f"from_publication_date:{year}"

# Year range, expected format: 'YYYY-YYYY'

start_year, end_year = [part.strip() for part in year.split("-", 1)]

if (

len(start_year) == 4

and len(end_year) == 4

and start_year.isdigit()

and end_year.isdigit()

):

params["filter"] = (

f"from_publication_date:{start_year}-01-01,"

f"to_publication_date:{end_year}-12-31"

)

else:

# Fallback: treat as a single publication_year filter

params["filter"] = f"publication_year:{year}"

Copilot · 2026-01-23T10:04:29Z

paper_search_mcp/academic_platforms/openalex.py

+        params = {
+            "filter": f"referenced_by:{openalex_id}",
+            "per-page": max_results,
+            "mailto": self.EMAIL_PARAM
+        }


get_references uses filter=referenced_by:{openalex_id}, which reads as “works that reference this work” (forward citations), not “works referenced by this work”. To return actual references, fetch the work first, read its referenced_works IDs, then bulk-fetch those works’ metadata. Also consider clamping per-page to 200 to match OpenAlex’s max page size.

Copilot · 2026-01-23T10:04:30Z

paper_search_mcp/academic_platforms/openalex.py

+            openalex_id: OpenAlex ID
+            max_results: Maximum number of related papers to return
+
+        Returns:
+            List of related Paper objects
+        """
+        if openalex_id.startswith("http"):
+            openalex_id = openalex_id.split("/")[-1]
+
+        url = f"{self.BASE_URL}/works"
+        params = {
+            "filter": f"has_concepts:{openalex_id}",
+            "per-page": max_results,
+            "sort": "cited_by_count:desc",
+            "mailto": self.EMAIL_PARAM
+        }
+
+        try:


get_related_papers uses filter=has_concepts:{openalex_id}, but openalex_id here is a work ID (W...), not a concept ID. This is likely to return incorrect/empty results. Fetch the work, extract its concept IDs, and filter by those concept IDs (or use an OpenAlex-provided related-works mechanism if available).

Suggested change

openalex_id: OpenAlex ID

max_results: Maximum number of related papers to return

Returns:

List of related Paper objects

"""

if openalex_id.startswith("http"):

openalex_id = openalex_id.split("/")[-1]

url = f"{self.BASE_URL}/works"

params = {

"filter": f"has_concepts:{openalex_id}",

"per-page": max_results,

"sort": "cited_by_count:desc",

"mailto": self.EMAIL_PARAM

}

try:

openalex_id: OpenAlex work ID or URL

max_results: Maximum number of related papers to return

Returns:

List of related Paper objects

"""

# Normalize to bare OpenAlex work ID (e.g., "W123456789")

if openalex_id.startswith("http"):

openalex_id = openalex_id.split("/")[-1]

try:

# First, fetch the work to get its associated concepts

work_url = f"{self.BASE_URL}/works/{openalex_id}"

work_params = {

"mailto": self.EMAIL_PARAM

}

work_response = requests.get(work_url, params=work_params, timeout=30)

work_response.raise_for_status()

work_data = work_response.json()

concepts = work_data.get("concepts") or []

concept_ids = [

concept.get("id", "").split("/")[-1]

for concept in concepts

if concept.get("id")

]

# If no concepts are associated with this work, we cannot find concept-based related papers

if not concept_ids:

return []

# Build a filter that matches works sharing any of these concepts

concept_filter_value = "|".join(concept_ids)

url = f"{self.BASE_URL}/works"

params = {

"filter": f"concepts.id:{concept_filter_value}",

"per-page": max_results,

"sort": "cited_by_count:desc",

"mailto": self.EMAIL_PARAM

}

Copilot · 2026-01-23T10:04:30Z

paper_search_mcp/academic_platforms/openalex.py

+                "filter": f"author.id:{author_id}",
+                "per-page": max_results,
+                "sort": "publication_date:desc",
+                "mailto": self.EMAIL_PARAM
+            }


search_by_author sets per-page directly from max_results without clamping to OpenAlex’s max page size (200). Consider per-page = min(max_results, 200) (and pagination if you want more than 200).

Copilot · 2026-01-23T10:04:30Z

paper_search_mcp/academic_platforms/openalex.py

+"""
+from typing import List, Optional
+from datetime import datetime
+import time


Unused imports: time is imported but never used in this module. Removing it avoids linter failures and reduces noise.

Suggested change

import time

Copilot · 2026-01-23T10:04:32Z

paper_search_mcp/server.py

+        **kwargs: Additional search parameters:
+            - filter: OpenAlex filter (e.g., 'has_fulltext:true,type:journal-article')
+            - sort: Sort field (e.g., 'cited_by_count:desc', 'publication_date:desc')
+            - fields: Comma-separated list of fields to return
+


The docstring advertises a fields kwarg for search_openalex, but the implementation never reads/forwards it. Either add fields passthrough to search_kwargs or remove it from the docstring so callers don’t assume it works.

Copilot · 2026-01-23T10:04:32Z

paper_search_mcp/academic_platforms/openalex.py

+    BASE_URL = "https://api.openalex.org"
+    EMAIL_PARAM = "mailto:paper-search-mcp@example.com"  # Polite identification
+
+    def __init__(self, email: Optional[str] = None):
+        """Initialize OpenAlex searcher.


EMAIL_PARAM includes a mailto: prefix but is then passed as the value of the mailto query parameter (resulting in mailto=mailto:...). Store just the email address (no prefix) and pass that as params['mailto'] (see how CrossRef does it in paper_search_mcp/academic_platforms/crossref.py:67).

Copilot · 2026-01-23T10:04:32Z

paper_search_mcp/academic_platforms/openalex.py

+            if pub_date:
+                try:
+                    published_date = datetime.fromisoformat(pub_date.replace("Z", "+00:00"))
+                except:


Except block directly handles BaseException.

Suggested change

except:

except (ValueError, TypeError):

# If the publication date is malformed or of an unexpected type,

# leave published_date as None and continue.

Copilot · 2026-01-23T10:04:33Z

paper_search_mcp/academic_platforms/openalex.py

+                    # Sort by position and join
+                    sorted_indices = sorted(index_to_word.keys())
+                    abstract_text = " ".join([index_to_word[i] for i in sorted_indices])
+                except:


Except block directly handles BaseException.

Suggested change

except:

except Exception:

Copilot · 2026-01-23T10:04:33Z

paper_search_mcp/academic_platforms/openalex.py

+                except:
+                    pass


'except' clause does nothing but pass and there is no explanatory comment.

Suggested change

except:

pass

except (ValueError, TypeError):

# If the publication date is not a valid ISO format, treat it as missing.

published_date = None

universea requested a review from Copilot January 23, 2026 09:58

Copilot started reviewing on behalf of universea January 23, 2026 09:58 View session

Copilot AI reviewed Jan 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(openalex): add OpenAlex API integration#30

feat(openalex): add OpenAlex API integration#30
hongkongkiwi wants to merge 1 commit intoopenags:mainfrom
hongkongkiwi:feature/openalex-integration

hongkongkiwi commented Jan 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 23, 2026

Uh oh!

Copilot AI Jan 23, 2026

Uh oh!

Copilot AI Jan 23, 2026

Uh oh!

Copilot AI Jan 23, 2026

Uh oh!

Copilot AI Jan 23, 2026

Uh oh!

Copilot AI Jan 23, 2026

Uh oh!

Copilot AI Jan 23, 2026

Uh oh!

Copilot AI Jan 23, 2026

Uh oh!

Copilot AI Jan 23, 2026

Uh oh!

Copilot AI Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		# Year range
		params["filter"] = f"from_publication_date:{year}"

-                # Year range
-                params["filter"] = f"from_publication_date:{year}"
+                # Year range, expected format: 'YYYY-YYYY'
+                start_year, end_year = [part.strip() for part in year.split("-", 1)]
+                if (
+                    len(start_year) == 4
+                    and len(end_year) == 4
+                    and start_year.isdigit()
+                    and end_year.isdigit()
+                ):
+                    params["filter"] = (
+                        f"from_publication_date:{start_year}-01-01,"
+                        f"to_publication_date:{end_year}-12-31"
+                    )
+                else:
+                    # Fallback: treat as a single publication_year filter
+                    params["filter"] = f"publication_year:{year}"

-                except:
+                except (ValueError, TypeError):
+                    # If the publication date is malformed or of an unexpected type,
+                    # leave published_date as None and continue.

-                except:
-                    pass
+                except (ValueError, TypeError):
+                    # If the publication date is not a valid ISO format, treat it as missing.
+                    published_date = None

Conversation

hongkongkiwi commented Jan 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants