You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I'm trying to create a Uno Custom Environment with stable baselines3. There are two or three problems I come across for which I'd like to have suggestions or ideas to go around them or straight up easy way to do it:
My version of Uno is simplified for now (only 0-9 cards of 4 colours). The player either plays a valid card or picks a new one. I use a MultiDiscrete action space for this.
self.action_space = spaces.MultiDiscrete([2, 40])
Two stands for 2 possible actions (play a card or draw one). Now I put 40 because there are 40 possible different cards you can play in Uno. Now, in the step(action)function, it makes it really hard for the algorithm to learn because it generates actions that are invalid (whether the card is not in the agents hand or the card can't be played according to Uno's rules).
I also tried instead to have the ai choose an index of the cards in his hand. The thing is that the hand size varies during the game and I don't think we can vary the action space from MultiDiscrete([2, 7])to MultiDiscrete([2, 9]) during the game.
What I would like to do instead is to filter the possible actions prior to generate the action for the step function and feed that list instead to the algorithm. Is that possible? So that in the step function, I only receive actions that are valid.
Here is my step function. Let me know if you need more code:
` def step(self, action: ActType):
self.render()
card_to_play = UnoCard.decode_from_int(action[1])
if not (card_to_play.code in [c.code for c in self.agent.cards]):
self.cards_not_in_hand += 1
return self.get_observations(), CARD_NOT_IN_HAND_REWARD, True, False, {}
elif not self.uno_game.is_play_valid(card_to_play):
self.illegal_moves += 1
return self.get_observations(), ILLEGAL_MOVE_REWARD, True, False, {}
self.uno_game.play(action[0], card_to_play)
if self.uno_game.is_game_over():
self.wins += 1
return self.get_observations(), WIN_REWARD, True, False, {}
# second player plays
act_type, card = self.uno_game.get_current_player().choose_action(self.uno_game)
self.uno_game.play(act_type, card)
if self.uno_game.is_game_over():
self.losses += 1
return self.get_observations(), LOSS_REWARD, True, False, {}
self.valid_moves += 1
return self.get_observations(), VALID_MOVE_REWARD, False, False, {}`
Let me know how you would approach this. Maybe there is another way entirely?
The text was updated successfully, but these errors were encountered:
Hi, I'm trying to create a Uno Custom Environment with stable baselines3. There are two or three problems I come across for which I'd like to have suggestions or ideas to go around them or straight up easy way to do it:
My version of Uno is simplified for now (only 0-9 cards of 4 colours). The player either plays a valid card or picks a new one. I use a MultiDiscrete action space for this.
self.action_space = spaces.MultiDiscrete([2, 40])
Two stands for 2 possible actions (play a card or draw one). Now I put 40 because there are 40 possible different cards you can play in Uno. Now, in the
step(action)
function, it makes it really hard for the algorithm to learn because it generates actions that are invalid (whether the card is not in the agents hand or the card can't be played according to Uno's rules).I also tried instead to have the ai choose an index of the cards in his hand. The thing is that the hand size varies during the game and I don't think we can vary the action space from
MultiDiscrete([2, 7])
toMultiDiscrete([2, 9])
during the game.What I would like to do instead is to filter the possible actions prior to generate the action for the step function and feed that list instead to the algorithm. Is that possible? So that in the step function, I only receive actions that are valid.
Here is my step function. Let me know if you need more code:
` def step(self, action: ActType):
Let me know how you would approach this. Maybe there is another way entirely?
The text was updated successfully, but these errors were encountered: