Hello! I've found a performance issue in /datasets/dataloader.py: dataset.batch(self.config['batch_size'])(here) should be calle before dataset.map(transform_fn, num_parallel_calls=self.config['prefetch_threads'])(here), which could make your program more efficient.
Here is the tensorflow document to support it.
Besides, you need to check the function transform_fn called in dataset.map(transform_fn, num_parallel_calls=self.config['prefetch_threads']) whether to be affected or not to make the changed code work properly. For example, if transform_fn needs data with shape (x, y, z) as its input before fix, it would require data with shape (batch_size, x, y, z) after fix.
Looking forward to your reply. Btw, I am very glad to create a PR to fix it if you are too busy.
Hello! I've found a performance issue in /datasets/dataloader.py:
dataset.batch(self.config['batch_size'])(here) should be calle beforedataset.map(transform_fn, num_parallel_calls=self.config['prefetch_threads'])(here), which could make your program more efficient.Here is the tensorflow document to support it.
Besides, you need to check the function
transform_fncalled indataset.map(transform_fn, num_parallel_calls=self.config['prefetch_threads'])whether to be affected or not to make the changed code work properly. For example, iftransform_fnneeds data with shape (x, y, z) as its input before fix, it would require data with shape (batch_size, x, y, z) after fix.Looking forward to your reply. Btw, I am very glad to create a PR to fix it if you are too busy.