You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm working on a project involving sentiment analysis of financial news, and I'm using yfinance to gather stock data. However, I'm facing a couple of challenges when it comes to scraping news articles. I have two specific questions:
Filtering sources and getting more links:
I'm currently able to get news from Yahoo Finance, but I also receive articles from sources like Motley Fool.
BeautifulSoup cannot retrieve the text from Motley Fool articles, so I'd like to filter and get only links from Yahoo Finance.
Is there a way to get more than the default 8 links that are returned? Also, I'd prefer not to use Selenium, as I want the pipeline to remain efficient.
2. Scraping specific parts of Motley Fool articles:
Since BeautifulSoup struggles with extracting text from Motley Fool articles, is there a way to scrape only metadata like the title or a portion of the article body (e.g., the first few paragraphs)?
I'm open to suggestions for better tools or methods to achieve this efficiently without scraping the entire article.
Thanks in advance for any guidance or recommendations!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi,
I'm working on a project involving sentiment analysis of financial news, and I'm using yfinance to gather stock data. However, I'm facing a couple of challenges when it comes to scraping news articles. I have two specific questions:
I'm currently able to get news from Yahoo Finance, but I also receive articles from sources like Motley Fool.
BeautifulSoup cannot retrieve the text from Motley Fool articles, so I'd like to filter and get only links from Yahoo Finance.
Is there a way to get more than the default 8 links that are returned? Also, I'd prefer not to use Selenium, as I want the pipeline to remain efficient.
2. Scraping specific parts of Motley Fool articles:
Since BeautifulSoup struggles with extracting text from Motley Fool articles, is there a way to scrape only metadata like the title or a portion of the article body (e.g., the first few paragraphs)?
I'm open to suggestions for better tools or methods to achieve this efficiently without scraping the entire article.
Thanks in advance for any guidance or recommendations!
Beta Was this translation helpful? Give feedback.
All reactions