Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scrapy-selenium is yielding normal scrapy.Request instead of SeleniumRequest #78

Open
iamumairayub opened this issue Oct 5, 2020 · 4 comments

Comments

@iamumairayub
Copy link

iamumairayub commented Oct 5, 2020

@clemfromspace I just decided to use your package in my Scrapy project but it is just yielding normal scrapy.Requuest instead of SeleniumRequest

from shutil import which
from scrapy_selenium import SeleniumRequest
from scrapy.contracts import Contract
class WithSelenium(Contract):
    """ Contract to set the request class to be SeleniumRequest for the current call back method to test
    @with_selenium
    """
    name = 'with_selenium'
    request_cls = SeleniumRequest
    
class WebsiteSpider(BaseSpider):
    name = 'Website'

    custom_settings = {
        'DOWNLOADER_MIDDLEWARES': {
             'scrapy_selenium.SeleniumMiddleware': 800
        },
        'SELENIUM_DRIVER_NAME': 'firefox',
        'SELENIUM_DRIVER_EXECUTABLE_PATH': which('geckodriver'),
        'SELENIUM_DRIVER_ARGUMENTS': ['-headless']  
    }
    
    def start_requests(self):
		yield SeleniumRequest(url=url, 
			callback=self.parse_result)
                
				
    def parse_result(self, response):
        """
        @with_selenium
        """
        print(response.request.meta['driver'].title)     --> gives key error   

I have seen this issue but this is not helpful at all

@heysamtexas
Copy link

I plus-oned this and then solved it for myself a little later.

For me this is not in the context of testing, so I have no need for contracts (at least as far as understand it).

My solve was the following:

  1. Override start_requests() (as you have done)
  2. yield SeleniumRequest() in parse_result. I notice that you use parse_result() instead of parse()

Once I did this it started working. My solution snippet:

    def start_requests(self):
        cls = self.__class__
        if not self.start_urls and hasattr(self, 'start_url'):
            raise AttributeError(
                "Crawling could not start: 'start_urls' not found "
                "or empty (but found 'start_url' attribute instead, "
                "did you miss an 's'?)")
        for url in self.start_urls:
            yield SeleniumRequest(url=url, dont_filter=True)

    def parse(self, response):
        le = LinkExtractor()
        for link in le.extract_links(response):
            yield SeleniumRequest(
                url=link.url,
                callback=self.parse_result
            )

    def parse_result(self, response):
        page = PageItem()
        page['url'] = response.url
        yield page

@educatron
Copy link

Hey @undernewmanagement

I tried your snippet but the links in LinkExtractor are not processed correctly (response body is not text).

    rules = ( Rule(LinkExtractor(restrict_xpaths=(['//*[@id="breadcrumbs"]'])), follow=True),) 

    def start_requests(self):
        for url in self.start_urls:
            yield SeleniumRequest(url=url, dont_filter=True,)

    def parse_start_url(self, response):
        return self.parse_result(response)

    def parse(self, response):
        le = LinkExtractor()
        for link in le.extract_links(response):
        	yield SeleniumRequest(url=link.url, callback=self.parse_result,)

    def parse_result(self, response):
        page = PageItem()
        page['url'] = response.url
        yield page

I had to use parse_start_url to assign the parse_result callback to start urls.

Do you know what the problem could be? I'm new in Scrapy and Python.

Thanks!

@heysamtexas
Copy link

Hey @educatron thanks for the question - let's not hijack the thread here. I think you should take that question directly to the scrapy community. https://scrapy.org/community/

@educatron
Copy link

Ok. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants