You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I find that ratio_0_to_1 is defined by: ratio_0_to_1 = layer_id / (args.n_layer - 1)
Then it defines multiple things for time_mix and time_decay.
However, my issue is I want to set args.n_layer = 1 , which would lead to the zero-division error.
Does it make sense to hardcode ratio_0_to_1 = 0 when args.n_layer = l?
The text was updated successfully, but these errors were encountered:
Can you intuitively explain what
ratio_0_to_1
is doing inRWKV_Tmix_x060
?https://github.com/BlinkDL/RWKV-LM/blob/main/RWKV-v5/src/model.py#L290
I find that
ratio_0_to_1
is defined by:ratio_0_to_1 = layer_id / (args.n_layer - 1)
Then it defines multiple things for
time_mix
andtime_decay
.However, my issue is I want to set
args.n_layer = 1
, which would lead to the zero-division error.Does it make sense to hardcode
ratio_0_to_1 = 0
whenargs.n_layer = l
?The text was updated successfully, but these errors were encountered: