Skip to content

Commit 0cc9c95

Browse files
jeffkbkimfacebook-github-bot
authored andcommitted
3/N: CPUOffloadedRecMetricModule
Summary: This diff adds the basic building blocks for a zero overhead RecMetrics implementation. Follow up patches will contain integration with users of torchrec. One of the main pain points of using RecMetricModule is that metric updates and computes are done synchronously. In training jobs, there has been cases where metric updates take +20% of a training iteration. Metric computations, although less frequent, can takes over a couple of seconds. CPUOffloadedRecMetricModule aims to perform all metric updates/computes asynchronously, completely removing them from the critical path. This patch adds: - CPUOffloadedRecMetricModule: RecMetricModule that offloads metric update() and compute() to CPU using background threads and dual queues. Differential Revision: D83773529
1 parent 54f7f91 commit 0cc9c95

File tree

2 files changed

+1174
-0
lines changed

2 files changed

+1174
-0
lines changed

0 commit comments

Comments
 (0)