get_all_comments does not return max results #52

Roechiiii · 2018-03-17T12:56:22Z

Hi, first of all thank you for your awesome R package to scrape for comments at youtube. I am using your package to analyse some comments, but I came up with the problem that not all comments can be collected. I think the issue have been mentioned in other issues (get_comment_threads) as well but this problem focuses on the method "get_all_comments". The original video has 3040 comments, the function returns only 2335 records, so approximately 30% get lost. The bigger problem in my opinion is the returning of the replies. Looking at the user in "top comment" category it can be seen that the original video counts 34 different replies, the function returns only 5, so the communication between different users will be lost.

comments <- get_all_comments(video_id = "zz-RpiUFY-I")

The text was updated successfully, but these errors were encountered:

soodoku · 2018-03-19T18:50:03Z

Thanks for diagnosing this. Need to investigate why I am not getting more than 5 replies. Super weird. But confirmed that it is true. Aargh!

When youtube counts total comments, it also counts replies. So the discrepancy is very likely just driven by not fetching more than 5.

Roechiiii · 2018-03-19T23:41:18Z

You are welcome! Yes absolutely, I was wondering why the maximum amount of replies is always between 0-5. The following example on https://stackoverflow.com/questions/29692972/youtube-comment-scraper-returns-limited-results/29871427#29871427 describes a similar problem. In this example the pageToken of the current request is returned as the previous requests nextPageToken to update the session. I am note sure if you already implemented it in your package, but maybe it will help you.

soodoku · 2018-03-19T23:50:02Z

The function iterates over pages of results. So that isn't a problem. There may be an issue with basically getting replies of replies. Will investigate this.

rangaro · 2018-05-02T09:17:23Z

I stumbled upon the same issue. Interestingly, other tools like Webometric Analyst and YouTube Data Tools also do not return the maximum of comments, but the discrepancy is way smaller (e.g., 2.276 of 2.287 comments).

So I am curious if there will be any fix soon?

rainbowfan · 2018-05-16T16:08:54Z

Hi, I would like to follow up on the issue above and see whether there's a solution. There are three different scenarios that I encountered. 1. As what was discussed above, replies are not extracted completely using get_all_comments function. 2. I found a youtube video (id: 49Ilvc8WiG8) in which there is no comment but shows 1 comment in total. This is not a bug from the package, but if someone can give me a hint, that would be greatly appreciated. 3. Some hidden comments are being extracted. For example, for video bK6DVXty0gQ I extracted two more comments which were hidden on that video page. Does this mean the author of the channel deleted those comments or someone reported issues on thost comments?

Thank you!

chspoerlein · 2018-06-08T05:58:17Z

Hi soodoku, thanks for your amazing work and service to the community! I found the same issue as Roechiiii where a maximum of 5 replies per comment get extracted. I remember some wrapper function for extracting reddit comments where one had to explicitely code that the function presses "show more". Could this be an issue here?

leftyveggie · 2018-08-02T14:36:45Z

Hi Soodoku - first of all - thank you to you and your co-contributors for making this excellent package. I think I have a possible work around for the replies issue (up to 100 replies). If you do not unlist totalReplyCount (lines 63-66), then we can more easily identify those comments with replies (e.g. df[!(df$totalReplyCount=="0"),] and use your get_comments(filter = c(parent_id = x)) to get the (in most cases) complete threads.

Would possibly be something that you might be able to do? Or is there a more important reason why total reply count is deleted?

soodoku · 2018-08-02T15:04:00Z

Thanks for the hint @leftyveggie! Will try it over the weekend if that works for you.

leftyveggie · 2018-08-02T15:21:56Z

@soodoku I think maybe I have slightly misunderstood - perhaps it is not at this part of the code - I meant if you could change the output of the data frame to include totalReplyCount as a column! Thank you for the quick reply!

voskresenskiy · 2018-10-29T11:33:27Z

Hi! Thanks for the great package!
Was the reason for not downloading all comments identified?
Don't we get some replies?
Sorry for disturbing)

orientune · 2020-01-05T18:51:49Z

I tried for different videos and my results here.
If one author has only replies (doesnt have comment about video) ,then according to number of replies behave.If the author has not more than 5 replies then dont scrape anyone.But if has more than 5 replies then some comments are scraping.
And if one author has both himself comments and replies then more than second man (up I told) comments are scraping.

rangaro · 2020-02-10T13:24:35Z

On the 22nd of February 2019, we did a test run with the Dayum video (DcJFdCmN98s). The video page informed us to expect 47,163 comments. YouTube Data Tools from Bernhard Rieder extracted 47,153 comments (10 missings). However, tuber extracted 44,810 comments, and Webometric Analyst from Mike Thelwall extracted 44,828.

Webometric Analyst only retrieves five follow-up comments because it does not take the comment pagination into account. The tuber results are pretty close. I think the iteration through the replies pages is not working correctly in tuber. Maybe Bernhard Rieder can be asked how he solved the problem in his tool. https://twitter.com/riederb

rangaro · 2021-03-15T10:27:36Z

Is there any update on this? The bug report is now nearly 3 years old.

balthasars · 2021-04-13T18:13:15Z

Correct me if I'm wrong but to me it appears that get_all_comments() only implements the query to the commentThreads resource: https://developers.google.com/youtube/v3/docs/commentThreads

The resource states

A commentThread resource contains information about a YouTube comment thread, which comprises a top-level comment and replies, if any exist, to that comment.

So far so good, but it then goes on saying

The commentThread resource does not necessarily (!) contain all replies to a comment, and you need to use the comments.list method if you want to retrieve all replies for a particular comment. Also note that some comments do not have replies.

So I believe additional queries to this resource need to be implemented to retrieve the replies to all comments. I don't see any other GET queries in the source code apart from those to the commentThreads resource.

So this is just a wild guess (to be completely honest, I don't fully understand the code of process_page() yet, but could this be the issue here?

…ermediate fix for `get_all_comments()` that doesn't seem to be working properly (see gojiplus#52)

hamaer0214 · 2022-08-27T13:54:00Z

I am in the same problem today. I use youtube API to get all comments, but only 0-5 replies max.
And I didn't find any clue on the youtube API page.

Roechiiii changed the title ~~get_comment_all does not return max results~~ get_all_comments does not return max results Mar 17, 2018

soodoku self-assigned this Mar 19, 2018

soodoku added the bug label Mar 19, 2018

balthasars added a commit to balthasars/tuber that referenced this issue Apr 20, 2021

Merge branch 'get-most-comments' into master, add new function as int…

77092a8

…ermediate fix for `get_all_comments()` that doesn't seem to be working properly (see gojiplus#52)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

get_all_comments does not return max results #52

get_all_comments does not return max results #52

Roechiiii commented Mar 17, 2018 •

edited

Loading

soodoku commented Mar 19, 2018

Roechiiii commented Mar 19, 2018

soodoku commented Mar 19, 2018

rangaro commented May 2, 2018

rainbowfan commented May 16, 2018

chspoerlein commented Jun 8, 2018

leftyveggie commented Aug 2, 2018

soodoku commented Aug 2, 2018

leftyveggie commented Aug 2, 2018

voskresenskiy commented Oct 29, 2018

orientune commented Jan 5, 2020

rangaro commented Feb 10, 2020

rangaro commented Mar 15, 2021

balthasars commented Apr 13, 2021 •

edited

Loading

hamaer0214 commented Aug 27, 2022

get_all_comments does not return max results #52

get_all_comments does not return max results #52

Comments

Roechiiii commented Mar 17, 2018 • edited Loading

soodoku commented Mar 19, 2018

Roechiiii commented Mar 19, 2018

soodoku commented Mar 19, 2018

rangaro commented May 2, 2018

rainbowfan commented May 16, 2018

chspoerlein commented Jun 8, 2018

leftyveggie commented Aug 2, 2018

soodoku commented Aug 2, 2018

leftyveggie commented Aug 2, 2018

voskresenskiy commented Oct 29, 2018

orientune commented Jan 5, 2020

rangaro commented Feb 10, 2020

rangaro commented Mar 15, 2021

balthasars commented Apr 13, 2021 • edited Loading

hamaer0214 commented Aug 27, 2022

Roechiiii commented Mar 17, 2018 •

edited

Loading

balthasars commented Apr 13, 2021 •

edited

Loading