You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'd like the output from history$get_data_table() to include a column for the predicted values of the chosen arms at each step.
For example, for EpsilonGreedyPolicy it would just be self$theta$mean[[chosen_arm]], which I realise is available by setting save_theta = TRUE in Simulator$new. If I also set save_context = TRUE the predicted value of the chosen action can be obtained. (Although I have to take into account the fact that the theta values are one time step ahead of the values for the current context-arm pair since they have been updated with the reward from the current context-arm pair. That is, the theta values do not hold the predicted values for the current context-arm pair since they hold the values computed after the reward for the current context-arm pair is known.)
With other policies, such as ContextualEpsilonGreedyPolicy, using the output from history$get_data_table() to compute the expected reward for the current action before it is taken is not so straightforward. I see in policy_cmab_lin_epsilon_greedy.R that you compute expected_rewards[arm], but you don't seem to save the values for output later on. It is exactly expected_rewards[arm] that I would like history$get_data_table() to include in its output. Having expected_rewards[arm] for just the chosen arm would be enough for my current needs, but maybe having expected_rewards[arm] for all arms would be useful in future.
I had a look at history.R to see if I could work out how to save the values of expected_rewards, but it looks rather complicated to me and my R is nowhere near as good as yours :-).
Thanks,
Paul
The text was updated successfully, but these errors were encountered:
Yes, this issue is still relevant to me. The reason I'd like the predicted values of each arm is so that I can rank the arms in order of their predicted values. I'd like to rank the arms before a particular arm is chosen.
Hello Robin,
This is a feature request, not a bug report.
I'd like the output from
history$get_data_table()
to include a column for the predicted values of the chosen arms at each step.For example, for
EpsilonGreedyPolicy
it would just beself$theta$mean[[chosen_arm]]
, which I realise is available by settingsave_theta = TRUE
inSimulator$new
. If I also setsave_context = TRUE
the predicted value of the chosen action can be obtained. (Although I have to take into account the fact that thetheta
values are one time step ahead of the values for the current context-arm pair since they have been updated with the reward from the current context-arm pair. That is, thetheta
values do not hold the predicted values for the current context-arm pair since they hold the values computed after the reward for the current context-arm pair is known.)With other policies, such as
ContextualEpsilonGreedyPolicy
, using the output fromhistory$get_data_table()
to compute the expected reward for the current action before it is taken is not so straightforward. I see inpolicy_cmab_lin_epsilon_greedy.R
that you computeexpected_rewards[arm]
, but you don't seem to save the values for output later on. It is exactlyexpected_rewards[arm]
that I would likehistory$get_data_table()
to include in its output. Havingexpected_rewards[arm]
for just the chosen arm would be enough for my current needs, but maybe havingexpected_rewards[arm]
for all arms would be useful in future.I had a look at
history.R
to see if I could work out how to save the values ofexpected_rewards
, but it looks rather complicated to me and myR
is nowhere near as good as yours :-).Thanks,
Paul
The text was updated successfully, but these errors were encountered: