-
Notifications
You must be signed in to change notification settings - Fork 354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Passing URL to parse in Scrapy Spider URL is captured using Scrapy-Selenium #81
Comments
Is anybody here? I added a link to read the full code I wrote. |
@Mathoholic you should not have to use Plus you should use the Please review this project spider exemple: https://github.com/tristanlatr/charityvillage_jobs/blob/master/charityvillage_jobs/spiders/charityvillage_com.py Then you should find the I hope the issue is clearer now |
Oh!! Yes now I get it. Thanks for taking time to sharing this much needed insight. I will make changes to my Spider. Thanks a lot. |
@tristanlatr one last thing I want to know while using scrapy only we randomise the useragent but the while using scrapy_selenium the randomization is not working. The website throws <403> with captacha kind of page. The code was this:
|
@tristanlatr please share some insight for the issue, will be grateful. |
Sorry, if you hit a captcha, your are out of luck I think. |
@Mathoholic You've hit something that's unfortunately pretty common nowadays, and isn't just limited to user agents. There's no easy answer to this, because bot detection and countermeasures are an evolving arms race. There's many factors that go into evading bot detection (including machine learning), but it's not the fault of Selenium, Scrapy, or this project in particular. |
I am trying to scrape a website which has some dropdowns, So I planned to use Scrapy Framework with Scrapy-Selenium(more here) to click around the dropdowns(Nested For loop) and then capture the URL using below code and pass it to the parse() function to look for the needed data and scrape it to MySQL Database.
But the logics seems not working as expected. Any insight to deal with this is appreciated. The full code is here.
EDIT: I tried using SeleniumRequest() as well but it seems that too is not working.
The text was updated successfully, but these errors were encountered: