You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Suppose the transformer hidden state at event i is h_i, should the likelihood of this event be computed using h_i or h_{i-1}?
Using h_{i-1} makes more sense to me because this will encourage model to assign high intensity to the true next event, therefore learn to forecast.
But the implementation and the paper seem to be using h_i. The problem is that, since the transformer is given the true event i as part of the input, it can simply learn to output infinitely high intensity for the correct event type in order to maximize the likelihood. Still, the learned model will have no predictive power.
I feel I must have missed something. Any clarification is appreciated. Thanks.