A Python-based scraping tool designed to extract research paper data from Google Scholar, specifically focused on Hyperspectral Image (HSI) Classification using Graph Neural Networks (GNNs).
This repository contains a Jupyter Notebook (googlescholar_scraper.ipynb) that automates the collection of academic titles and authors. It is pre-configured to target the latest advancements (2020βpresent) in graph-based remote sensing.
- Architecture-Specific Queries: Specialized search loops for GCN, GAT, and GraphSAGE models.
- Temporal Filtering: Automatically restricts results to publications from the year 2020 onwards.
- Clean Data Extraction: Parses HTML to isolate clean Paper Titles and Author/Source strings.
- Pagination Support: Scrapes multiple result pages (approx. 60 papers per query).
- Rate Limit Protection: Implements a
time.sleep()delay to minimize the risk of IP blocking.
The script utilizes the following Python libraries:
requests: For handling HTTP requests to Google Scholar.beautifulsoup4: For parsing the search result HTML.time: For managing request intervals.
- Open the notebook in Jupyter or Google Colab.
- Define your search string in the
queryvariable:query = "graph attention hyperspectral image classification"