Releases: taspinar/twitterscraper
improve logging and add option to disable proxy
- PR234: Adds command line argument -dp or --disableproxy to disable to use of proxy when querying.
- PR261: Improve logging; there is no ts_logger file, logger is initiated in main.py and query.py, loglevel is set via CLI.
Fixed query.py and some small updates
Fixed
- PR304: Fixed query.py by adding 'X-Requested-With': 'XMLHttpRequest' to header value.
- PR253: Fixed Docker build
Added
- PR313: Added example to README (section 2.3.1).
- PR277: Support emojis by adding the alt text of images to the tweet text.
additional tweet attributes
Add new Tweet attributes:
- links, hashtags
- image urls, video_url,
- whether or not it is a reply
- tweet-ID of parent tweet in case of reply,
- list of usernames to who is replied
Deleted some Tweet attributes:
- Tweet.retweet_id
- Tweet.retweeter_username
- Tweet.retweet_userid
additional user info and tweet fields
This version includes some additional fields in the output:
- is_veriried to indicate whether an user has verified status
- is_retweet to indicate whether an tweet is an retweet
- retweeter_username if it is an retweet
- retweeter_userid if it is an retweet
- retweet_id if it is an retweet
- epoch timestamp
In addition it is using billiard for multiprocessing, which makes it possible that twitterscraper is used in Celery.
fake-useragent is used in order to generate random useragent headers.
Add possibility to save to CSV format (and some fixes)
-
Users can now save the tweets to a CSV-format, by using the command line arguments "-c" or "--csv"
-
The default value of begindate is set to 2006-03-21. The previous value (2017-01-01)
was chosen arbitrarily and leaded to questions why not all tweets were retrieved. -
By using linspace() instead of range() to divide the number of days into
the number of parallel processes, edge cases ( p = 1 ) now also work fine.
First release on GitHub.
This is the first release of twitterscraper on github.