v1.3.2
added multiple kernel instance generation based on compute-id + kernel name (decreases number of clsetkernelarg() calls and makes async queue computing with same kernel name and different parameters)(for tiled computing by task pool + device pool)
added task (to compute() later instead)
added task pool and device pool features (non separable kernels are distributed to devices with greedy algorithm)
uses CekirdeklerCPP 1.3.1 binary (kutuphanecl.dll, 64 bit)