Should event likelihood be computed using current or last hidden state?

Suppose the transformer hidden state at event i is h_i, should the likelihood of this event be computed using h_i or h_{i-1}?

Using h_{i-1} makes more sense to me because this will encourage model to assign high intensity to the true next event, therefore learn to forecast.

But the implementation and the paper seem to be using h_i. The problem is that, since the transformer is given the true event i as part of the input, it can simply learn to output infinitely high intensity for the correct event type in order to maximize the likelihood. Still, the learned model will have no predictive power.

I feel I must have missed something. Any clarification is appreciated. Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Should event likelihood be computed using current or last hidden state? #10

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Should event likelihood be computed using current or last hidden state? #10

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions