Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distilbert run on CPU #181

Open
swamysaranam opened this issue Jun 18, 2024 · 0 comments
Open

Distilbert run on CPU #181

swamysaranam opened this issue Jun 18, 2024 · 0 comments

Comments

@swamysaranam
Copy link

swamysaranam commented Jun 18, 2024

I have followed the instructions provided in https://github.com/intel/models/blob/master/quickstart/language_modeling/pytorch/distilbert_base/inference/cpu/README.md

I am using the quickstart/language_modeling/pytorch/distilbert_base/inference/cpu/run_multi_instance_realtime.sh script to estimate the latency and profile.

I enabled the following environment variables to profile the run:
export DNNL_VERBOSE=1
export DNNL_VERBOSE_TIMESTAMP=1

I notice that the log contains layer_normalization, eltwise in forward_training, but there is log related to matmul/inner product or softmax in the latency_*.log file.

Pasting the partial dump for reference:

onednn_verbose,1718391711600.335938,primitive,exec,cpu,layer_normalization,jit:uni,forward_training,src_f32::blocked:abc::f0 dst_f32::blocked:abc::f0 stats_f32::blocked:ab::f0 scale_f32::blocked:a::f0 shift_f32::blocked:a::f0,attr-scratchpad:user ,flags:CH,1x384x768,0.0361328 onednn_verbose,1718391711605.524902,primitive,exec,cpu,layer_normalization,jit:uni,forward_training,src_f32::blocked:abc::f0 dst_f32::blocked:abc::f0 stats_f32::blocked:ab::f0 scale_f32::blocked:a::f0 shift_f32::blocked:a::f0,attr-scratchpad:user ,flags:CH,1x384x768,0.0251465 onednn_verbose,1718391711609.137939,primitive,exec,cpu,eltwise,jit:avx2,forward_training,data_f32::blocked:abc::f0 diff_undef::undef:::,attr-scratchpad:user ,alg:eltwise_gelu_erf alpha:0 beta:0,1x384x3072,0.191162 onednn_verbose,1718391711613.104980,primitive,exec,cpu,layer_normalization,jit:uni,forward_training,src_f32::blocked:abc::f0 dst_f32::blocked:abc::f0 stats_f32::blocked:ab::f0 scale_f32::blocked:a::f0 shift_f32::blocked:a::f0,attr-scratchpad:user ,flags:CH,1x384x768,0.0349121 onednn_verbose,1718391711618.121094,primitive,exec,cpu,layer_normalization,jit:uni,forward_training,src_f32::blocked:abc::f0 dst_f32::blocked:abc::f0 stats_f32::blocked:ab::f0 scale_f32::blocked:a::f0 shift_f32::blocked:a::f0,attr-scratchpad:user ,flags:CH,1x384x768,0.0168457 onednn_verbose,1718391711621.819092,primitive,exec,cpu,eltwise,jit:avx2,forward_training,data_f32::blocked:abc::f0 diff_undef::undef:::,attr-scratchpad:user ,alg:eltwise_gelu_erf alpha:0 beta:0,1x384x3072,0.210938 onednn_verbose,1718391711625.866943,primitive,exec,cpu,layer_normalization,jit:uni,forward_training,src_f32::blocked:abc::f0 dst_f32::blocked:abc::f0 stats_f32::blocked:ab::f0 scale_f32::blocked:a::f0 shift_f32::blocked:a::f0,attr-scratchpad:user ,flags:CH,1x384x768,0.0371094 onednn_verbose,1718391711630.948975,primitive,exec,cpu,layer_normalization,jit:uni,forward_training,src_f32::blocked:abc::f0 dst_f32::blocked:abc::f0 stats_f32::blocked:ab::f0 scale_f32::blocked:a::f0 shift_f32::blocked:a::f0,attr-scratchpad:user ,flags:CH,1x384x768,0.0161133 onednn_verbose,1718391711634.527100,primitive,exec,cpu,eltwise,jit:avx2,forward_training,data_f32::blocked:abc::f0 diff_undef::undef:::,attr-scratchpad:user ,alg:eltwise_gelu_erf alpha:0 beta:0,1x384x3072,0.24292 onednn_verbose,1718391711638.688965,primitive,exec,cpu,layer_normalization,jit:uni,forward_training,src_f32::blocked:abc::f0 dst_f32::blocked:abc::f0 stats_f32::blocked:ab::f0 scale_f32::blocked:a::f0 shift_f32::blocked:a::f0,attr-scratchpad:user ,flags:CH,1x384x768,0.0380859 ^M100%|██████████| 100/100 [00:14<00:00, 6.51it/s] ^M 0%| | 0/100 [00:00<?, ?it/s]^[[A ^M 96%|█████████▌| 96/100 [00:00<00:00, 952.32it/s]^[[A^M100%|██████████| 100/100 [00:00<00:00, 954.63it/s] ^M100%|██████████| 100/100 [00:15<00:00, 6.60it/s] ** eval metrics ** eval_exact_match = 85.0 eval_f1 = 90.5 eval_samples = 100

Kindly let me know if I am using correct scripts. If so, can someone please help me in generating the full log.

P.S: Similar behavior is noticed for bert_base as well.

Thanks,
Swamy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant