Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] Trying to make a Uno Env #1199

Open
saledemon opened this issue Jan 16, 2025 · 0 comments
Open

[question] Trying to make a Uno Env #1199

saledemon opened this issue Jan 16, 2025 · 0 comments

Comments

@saledemon
Copy link

Hi, I'm trying to create a Uno Custom Environment with stable baselines3. There are two or three problems I come across for which I'd like to have suggestions or ideas to go around them or straight up easy way to do it:

My version of Uno is simplified for now (only 0-9 cards of 4 colours). The player either plays a valid card or picks a new one. I use a MultiDiscrete action space for this.

self.action_space = spaces.MultiDiscrete([2, 40])

Two stands for 2 possible actions (play a card or draw one). Now I put 40 because there are 40 possible different cards you can play in Uno. Now, in the step(action)function, it makes it really hard for the algorithm to learn because it generates actions that are invalid (whether the card is not in the agents hand or the card can't be played according to Uno's rules).

I also tried instead to have the ai choose an index of the cards in his hand. The thing is that the hand size varies during the game and I don't think we can vary the action space from MultiDiscrete([2, 7])to MultiDiscrete([2, 9]) during the game.

What I would like to do instead is to filter the possible actions prior to generate the action for the step function and feed that list instead to the algorithm. Is that possible? So that in the step function, I only receive actions that are valid.

Here is my step function. Let me know if you need more code:
` def step(self, action: ActType):

    self.render()

    card_to_play = UnoCard.decode_from_int(action[1])

    if not (card_to_play.code in [c.code for c in self.agent.cards]):
        self.cards_not_in_hand += 1
        return self.get_observations(), CARD_NOT_IN_HAND_REWARD, True, False, {}
    elif not self.uno_game.is_play_valid(card_to_play):
        self.illegal_moves += 1
        return self.get_observations(), ILLEGAL_MOVE_REWARD, True, False, {}

    self.uno_game.play(action[0], card_to_play)

    if self.uno_game.is_game_over():
        self.wins += 1
        return self.get_observations(), WIN_REWARD, True, False, {}

    # second player plays
    act_type, card = self.uno_game.get_current_player().choose_action(self.uno_game)
    self.uno_game.play(act_type, card)

    if self.uno_game.is_game_over():
        self.losses += 1
        return self.get_observations(), LOSS_REWARD, True, False, {}

    self.valid_moves += 1
    return self.get_observations(), VALID_MOVE_REWARD, False, False, {}`

Let me know how you would approach this. Maybe there is another way entirely?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant