PHP Scraper

An opinionated & limited way to scrape the web using PHP. The main goal is to get stuff done instead of getting distracted with xPath selectors, preparing data structures, etc. Instead, you can just "go to a website" and get an array with all details relevant to your scraping project.

Under the hood, it uses Goutte and a few other packages. See composer.json.

Examples

Here are a few impressions on the way the library works. More examples are on the project website.

Get the Title of a Website

All scraping functionality can be accessed either as a function call or a property call. On the example of title scraping this would like like this:

$web = new \spekulatius\phpscraper();

$web->go('https://google.com');

// Returns "Google"
echo $web->title;

// Also returns "Google"
echo $web->title();

Scrape the Images from a Website

Scraping the images including the attributes of the img-tags:

$web = new \spekulatius\phpscraper();

/**
 * Navigate to the test page.
 *
 * This page contains twice the image "cat.jpg".
 * Once with a relative path and once with an absolute path.
 */
$web->go('https://test-pages.phpscraper.de/meta/lorem-ipsum.html');

var_dump($web->imagesWithDetails);
/**
 * Contains:
 *
 * [
 *     'url' => 'https://test-pages.phpscraper.de/assets/cat.jpg',
 *     'alt' => 'absolute path',
 *     'width' => null,
 *     'height' => null,
 * ],
 * [
 *     'url' => 'https://test-pages.phpscraper.de/assets/cat.jpg',
 *     'alt' => 'relative path',
 *     'width' => null,
 *     'height' => null,
 * ]
 */

See the full documentation for more information and examples.

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
.github		.github
src		src
tests		tests
websites		websites
.editorconfig		.editorconfig
.gitignore		.gitignore
.scrutinizer.yml		.scrutinizer.yml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
README.md		README.md
composer.json		composer.json
header.jpg		header.jpg
phpunit.xml.dist		phpunit.xml.dist

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PHP Scraper

Sponsors

Examples

Get the Title of a Website

Scrape the Images from a Website

About

Uh oh!

Releases

Packages

Languages

License

phuclh/PHPScraper

Folders and files

Latest commit

History

Repository files navigation

PHP Scraper

Sponsors

Examples

Get the Title of a Website

Scrape the Images from a Website

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages