How about the performance difference between token-gate and sentence gate? And how about the value of alpha for load balance loss?