Skip to content

Should event likelihood be computed using current or last hidden state? #10

@mistycheney

Description

@mistycheney

Suppose the transformer hidden state at event i is h_i, should the likelihood of this event be computed using h_i or h_{i-1}?

Using h_{i-1} makes more sense to me because this will encourage model to assign high intensity to the true next event, therefore learn to forecast.

But the implementation and the paper seem to be using h_i. The problem is that, since the transformer is given the true event i as part of the input, it can simply learn to output infinitely high intensity for the correct event type in order to maximize the likelihood. Still, the learned model will have no predictive power.

I feel I must have missed something. Any clarification is appreciated. Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions