TorchBench CI has detected a performance signal. Affected Tests: - eval-cuda-fp32: - hf_Bert[disc (latency)] 8.16 -> 13.834, -69.5343% - hf_Bert[dynamo-disc (latency)] 6.865 -> 6.175, +10.051% - hf_Bert[disc (compiled)] 1151 -> 0 - hf_Bert[disc (clusters)] 1 -> 0 - eval-cuda-fp16: - hf_Bert[disc (compiled)] 1151 -> 0 - hf_Bert[disc (clusters)] 1 -> 0 detail data can be seen in oss://bladedisc-ci/TorchBench/gpu/tiny/20230803-15 created by TorchBench CI automatically