-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
get_all_comments does not return max results #52
Comments
Thanks for diagnosing this. Need to investigate why I am not getting more than 5 replies. Super weird. But confirmed that it is true. Aargh! When youtube counts total comments, it also counts replies. So the discrepancy is very likely just driven by not fetching more than 5. |
You are welcome! Yes absolutely, I was wondering why the maximum amount of replies is always between 0-5. The following example on https://stackoverflow.com/questions/29692972/youtube-comment-scraper-returns-limited-results/29871427#29871427 describes a similar problem. In this example the pageToken of the current request is returned as the previous requests nextPageToken to update the session. I am note sure if you already implemented it in your package, but maybe it will help you. |
The function iterates over pages of results. So that isn't a problem. There may be an issue with basically getting replies of replies. Will investigate this. |
I stumbled upon the same issue. Interestingly, other tools like Webometric Analyst and YouTube Data Tools also do not return the maximum of comments, but the discrepancy is way smaller (e.g., 2.276 of 2.287 comments). So I am curious if there will be any fix soon? |
Hi, I would like to follow up on the issue above and see whether there's a solution. There are three different scenarios that I encountered. 1. As what was discussed above, replies are not extracted completely using get_all_comments function. 2. I found a youtube video (id: 49Ilvc8WiG8) in which there is no comment but shows 1 comment in total. This is not a bug from the package, but if someone can give me a hint, that would be greatly appreciated. 3. Some hidden comments are being extracted. For example, for video bK6DVXty0gQ I extracted two more comments which were hidden on that video page. Does this mean the author of the channel deleted those comments or someone reported issues on thost comments? Thank you! |
Hi soodoku, thanks for your amazing work and service to the community! I found the same issue as Roechiiii where a maximum of 5 replies per comment get extracted. I remember some wrapper function for extracting reddit comments where one had to explicitely code that the function presses "show more". Could this be an issue here? |
Hi Soodoku - first of all - thank you to you and your co-contributors for making this excellent package. I think I have a possible work around for the replies issue (up to 100 replies). If you do not unlist totalReplyCount (lines 63-66), then we can more easily identify those comments with replies (e.g. df[!(df$totalReplyCount=="0"),] and use your get_comments(filter = c(parent_id = x)) to get the (in most cases) complete threads. Would possibly be something that you might be able to do? Or is there a more important reason why total reply count is deleted? |
Thanks for the hint @leftyveggie! Will try it over the weekend if that works for you. |
@soodoku I think maybe I have slightly misunderstood - perhaps it is not at this part of the code - I meant if you could change the output of the data frame to include totalReplyCount as a column! Thank you for the quick reply! |
Hi! Thanks for the great package! |
I tried for different videos and my results here. |
On the 22nd of February 2019, we did a test run with the Dayum video (DcJFdCmN98s). The video page informed us to expect 47,163 comments. YouTube Data Tools from Bernhard Rieder extracted 47,153 comments (10 missings). However, tuber extracted 44,810 comments, and Webometric Analyst from Mike Thelwall extracted 44,828. Webometric Analyst only retrieves five follow-up comments because it does not take the comment pagination into account. The tuber results are pretty close. I think the iteration through the replies pages is not working correctly in tuber. Maybe Bernhard Rieder can be asked how he solved the problem in his tool. https://twitter.com/riederb |
Is there any update on this? The bug report is now nearly 3 years old. |
Correct me if I'm wrong but to me it appears that The resource states
So far so good, but it then goes on saying
So I believe additional queries to this resource need to be implemented to retrieve the replies to all comments. I don't see any other GET queries in the source code apart from those to the So this is just a wild guess (to be completely honest, I don't fully understand the code of |
…ermediate fix for `get_all_comments()` that doesn't seem to be working properly (see gojiplus#52)
I am in the same problem today. I use youtube API to get all comments, but only 0-5 replies max. |
Hi, first of all thank you for your awesome R package to scrape for comments at youtube. I am using your package to analyse some comments, but I came up with the problem that not all comments can be collected. I think the issue have been mentioned in other issues (get_comment_threads) as well but this problem focuses on the method "get_all_comments". The original video has 3040 comments, the function returns only 2335 records, so approximately 30% get lost. The bigger problem in my opinion is the returning of the replies. Looking at the user in "top comment" category it can be seen that the original video counts 34 different replies, the function returns only 5, so the communication between different users will be lost.
comments <- get_all_comments(video_id = "zz-RpiUFY-I")
The text was updated successfully, but these errors were encountered: