Add batch_size(batch_size) to __find_in_batches (Mongoid) #1036
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
Add
.batch_size(batch_size)to#__find_in_batches(Mongoid).Fixes #1037 .
Although
.each_slice(batch_size)is useful in order to limit how many documents are sent to Elasticsearch at a time, it does nots limit the batch size of MongoDB'sgetMorecommands.By default, iterating over a MongoDB collection will first return 101 documents, and then subsequent batches of 16 MiB :
https://www.mongodb.com/docs/manual/tutorial/iterate-a-cursor/#cursor-batches
For example, a MongoDB collection containing documents averaging 1 KiB might return more than 16,000 documents at a time.
Although Mongoid claims in its documentation a default batch size of 1,000 documents, it does not seem to be the case.
Also, Mongoid's
.no_timeoutis broken right now and does nothing:mongodb/mongo-ruby-driver#2557
It is now likely that more than 10 minutes go by between two
getMorecommands and that the MongoDB cursor expires.Adding
.batch_size(batch_size)to the query makes sure that MongoDB documents are retrieved at the same rate as they are processed and indexed in Elasticsearch, and allow applications affected by the.no_timeoutissue to reduce the batch size to avoid cursor timeouts.