-
-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Empty Results When Using Spider Function with Category URL #696
Comments
Hi @felipehertzer, I cannot reproduce the issue, I get results for your example with the latest version of the code (from the Github repository). Did you make other changes? |
Hey @adbar, I have reinstalled it, but the issue persists. When I run the following code, the variable htmlstring, homepage, new_base_url = probe_alternative_homepage(url)
print(homepage, new_base_url) # result = /news/news ''
if htmlstring and homepage and new_base_url: |
I still cannot reproduce it, Besides, the lines I guess the check |
Hello @adbar, I apologise for the delayed response. I had some additional time to conduct further testing and identified the issue in the line below. I was able to do a fix on my side installing Specifically, it seems that the Here is the line of code in question: trafilatura/trafilatura/downloads.py Line 205 in f57ef0b
|
Thanks for the details, this is tricky, it may be a bug in urllib3. How do you think we can solve this? |
Hey @adbar,
I am currently testing the spider function, and I have encountered an issue when attempting to use a category URL to fetch posts specifically from that category.
Here is the code snippet that I am working with:
The function returns empty results. After investigating, I believe the problem may lie in this line of code. I modified the line to:
This change resolved the issue, but It breaks the redirect function.
Thank you.
The text was updated successfully, but these errors were encountered: