Mismatch performance on CelebAMask-HQ test set

Hello, thank you for your great work. 

But I run the face parsing model on the CelebAMask-HQ test set (2824 images) and got the below performance. 

The model I run is the **farl/celebm/448, face_parsing.farl.celebm.main_ema_181500_jit.pt.** 
F1 scores: {'background': 0.9343307778499743, 'skin': 0.9641438432481969, 'nose': 0.9377685027511485, 'eye_g': 0.8991579940116652, 'l_eye': 0.8797685119013225, 'r_eye': 0.8815088490017493, 'l_brow': 0.8546936399701022, 'r_brow': 0.8517906024905171, 'l_ear': 0.8826971414311515, 'r_ear': 0.8796045818209585, 'mouth': 0.9227481788076385, 'u_lip': 0.8879356316268103, 'l_lip': 0.9040920760745508, 'hair': 0.935249390735524, 'hat': 0.8693470068443545, 'ear_r': 0.697250254530866, 'neck_l': 0.3732396631852335, 'neck': 0.8658552106253891, 'cloth': 0.8273804800814614, 'fg_mean': 0.8507906421743688}

It seems the mean f is 85.07, which is not match with the 89.56 reported in the paper. 

Moreover, the necklace performance is very low, 37.32, which is much smaller than the 69.72 in paper. 

Can you help me to figure out the reason?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mismatch performance on CelebAMask-HQ test set #37

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Mismatch performance on CelebAMask-HQ test set #37

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions