Use case - In case of MapReduce - multiple Map tasks generates data related to a single reduce task.
Currently all the output files of a Map task are stored together in a DU, and later all the files related to a reduce from Map tasks are segregated by the MapReduce framework to pass the DU's as inputs to reduce task.
This could be optimized, if we can allow a Map task to write output files to a reduce DU's directly.
I envision, this could be a useful feature for other use cases too.
There could be some concurrency problems with metdata udpate, I just want log this, as we might come up with some solution to make this possible.
Use case - In case of MapReduce - multiple Map tasks generates data related to a single reduce task.
Currently all the output files of a Map task are stored together in a DU, and later all the files related to a reduce from Map tasks are segregated by the MapReduce framework to pass the DU's as inputs to reduce task.
This could be optimized, if we can allow a Map task to write output files to a reduce DU's directly.
I envision, this could be a useful feature for other use cases too.
There could be some concurrency problems with metdata udpate, I just want log this, as we might come up with some solution to make this possible.