-
Notifications
You must be signed in to change notification settings - Fork 60
[BUG] Diminishing data on get_user_following() & get_user_followers() #323
Comments
I've done some additional testing / troubleshooting. The issue may be connected to the Rate Limits being handled incorrectly by get_user_edges(). I put in an additional 30 minute rest timer every 15 users. Note how the function starts failing (only one datapoint returned for a number of accounts) after the sleep timers and bounces back after the additional 30min rest:
|
I've run some more checks. For one, the rate limit is being hit what seems to be 15 pages, not 15 users. This is rather unclear in both the Twitter API and package Docs. However, the actual problem starts as soon as the rate limit was hit and collection is resumed after the sleep timer. Strangely, though, this is not immediately the case. In my tests, after hitting the rate limit and sleeping, the current ID lookup concludes, the next lookup works fine and only then does it start silently returning empty data. See this log where I added an extra 15 minute grace period after every 5 accounts:
I am, however, at quite a loss what may cause this behaviour. Maybe the function does not track the right rate limit? That is, does .check_reset() need to distinguish between different rate limits or does it always fetch the correct limit to pass on to .trigger_sleep()? |
After additional tests, I can confirm that data returns remain unreliable even if data is returned. In many cases, you do not get all followers of a given user ID. I'm using a workaround now, confirming the number of followers pulled witht the number stated in get_user_profile and re-running if necessary. Doing this I noticed that the data returned seems relatively reliable if calls are wrapped in a for-loop, i.e. when checking every ID with a single call rather than in batches. Using a for-loop to loop through the vector of user IDs might therefore be a valid solution for this issue. Until it is resolved, however, I would recommend putting these functions in hiatus or at least put a warnig label on them, as results can be grossly misleading. |
Thank you for this, @TimBMK and apologies for slow comms. I've been away on holiday. I'll add this to TODOs for next release |
Please confirm the following
something went wrong. Status code: 400.
Describe the bug
Both functions get_user_following() and get_user_followers() at some point start returning less datapoints (i.e. follower/followee) for users where there should be considerably more. Eventually, this comes down to 1 datapoint. This, however, is not an actual datapoint: it is a row with all NULL/NA values except the from_id. That is, eventually, no actual data is returned from the endpoint, but the function states it as a returned datapoint (which makes the bug hard to notice).
This seems to only start occuring after the rate limit has been hit more than once. This may suggest a problem with the API side. However, I did not find any statements on rate limits other than the 15 lookups per 15 minutes stated here.
Expected Behavior
When looking up larger numbers of users' followers/followees through the respective functions, I would expect them to consistently return the correct data. If additional rate limits need to be adherred, I would expect the function to do this in line with the "sleep" behaviour already implemented. If this is not possible, I would expect the function to throw an error, rather than silently returning no data.
Steps To Reproduce
For me, the problem started occuring about halfway through the second chunk (after the first rest). Notice how user 18933321 returns only 111 instead of its actual 1.076 users. Afterwards, we get only one (empty) datapoint per user. This behaviour might vary per use, but I can trace the exact same pattern in another log. Here's the log for the above example:
The exact same issue occurs with get_user_followers():
Environment
Anything else?
This seems to be the same issue as in #187. However, pinning the problem down to the second half of the second batch might be helpful in tracking the problem down. Let me know if there's anything I can do to help solve the issue / stress test more. Slightly lost as to what may cause that problem atm. Especially since the functions fails completely after reaching the first rate limit (rather than only failing for a number of requests until the rate limit recovers). Furthermore, the fact that it starts returning less data before failing completely suggest there may be an additional rate limit at play, limiting the returns rather than only the requests?
The text was updated successfully, but these errors were encountered: