Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

403 Error when scraping images #211

Open
alfkoblischke opened this issue Nov 18, 2024 · 3 comments
Open

403 Error when scraping images #211

alfkoblischke opened this issue Nov 18, 2024 · 3 comments

Comments

@alfkoblischke
Copy link

I have a problem with some Image-URLS and get a 403 Error. It is no problem to call the URL in the Browser - that works perfectly.
Can anybody help? URL: [https://www.allianz.de/content/dam/onemarketing/azde/azd/newsletter/images/august-2024/lp3/newsletter-august-teaser-600x600-wohnwagenversicherung-lp3.jpg]

@spekulatius
Copy link
Owner

Hey @alfkoblischke

have you tried setting a browser agent?

Peter

@alfkoblischke
Copy link
Author

Hi Peter,
yes I have. My code:

`require_once './vendor/autoload.php';

use Spekulatius\PHPScraper\PHPScraper;

$url = 'https://www.allianz.de/content/dam/onemarketing/azde/azd/newsletter/images/august-2024/lp3/newsletter-august-teaser-600x600-wohnwagenversicherung-lp3.jpg';

$web = new PHPScraper;
$web->setConfig([
'agent' => 'Mozilla/5.0 (X11; Linux x86_64; rv:107.0) Gecko/20100101 Firefox/107.0'
]);

$sharingImage = $web->fetchAsset($url);`

And the following Error: "Fatal error: Uncaught Symfony\Component\HttpClient\Exception\ClientException: HTTP/2 403 returned for "https://www.allianz.de/content/dam/onemarketing/azde/azd/newsletter/images/august-2024/lp3/newsletter-august-teaser-600x600-wohnwagenversicherung-lp3.jpg....""

The strange thing is, that i can call the url in the browser and it works there.

Alf

@spekulatius
Copy link
Owner

hmm, I guess the site does validate the request against further criteria. You probably want to look at the other headers send over with the request. Copying the request from the browser console as curl, then striping the things away until it breaks might help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants