Shards play a critical role when reading information from {es}. Since it acts as a source, {eh} will create one Hadoop `InputSplit` per {es} shard, or in case of {sp} one `Partition`, that is given a query that works against index `I`. {eh} will dynamically discover the number of shards backing `I` and then for each shard will create, in case of Hadoop an input split (which will determine the maximum number of Hadoop tasks to be executed) or in case of Spark a partition which will determine the `RDD` maximum parallelism.
0 commit comments