Skip to content

ZeinabRahbar/googlescholar_scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Google Scholar Paper Scraper

A Python-based scraping tool designed to extract research paper data from Google Scholar, specifically focused on Hyperspectral Image (HSI) Classification using Graph Neural Networks (GNNs).

πŸ“Œ Overview

This repository contains a Jupyter Notebook (googlescholar_scraper.ipynb) that automates the collection of academic titles and authors. It is pre-configured to target the latest advancements (2020–present) in graph-based remote sensing.

πŸš€ Features

  • Architecture-Specific Queries: Specialized search loops for GCN, GAT, and GraphSAGE models.
  • Temporal Filtering: Automatically restricts results to publications from the year 2020 onwards.
  • Clean Data Extraction: Parses HTML to isolate clean Paper Titles and Author/Source strings.
  • Pagination Support: Scrapes multiple result pages (approx. 60 papers per query).
  • Rate Limit Protection: Implements a time.sleep() delay to minimize the risk of IP blocking.

πŸ› οΈ Requirements

The script utilizes the following Python libraries:

  • requests: For handling HTTP requests to Google Scholar.
  • beautifulsoup4: For parsing the search result HTML.
  • time: For managing request intervals.

πŸ“– Usage

  1. Open the notebook in Jupyter or Google Colab.
  2. Define your search string in the query variable:
    query = "graph attention hyperspectral image classification"

About

The "googlescholar_scraper" project is a Python script that utilizes web scraping techniques to extract paper data from Google Scholar. The script searches for papers based on a specific query, in this case, "graph network multimodal." It then scrapes the search results from the first 20 pages of Google Scholar.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors