Skip to content

Build environment such that we have a limited run length and diverse initial states #8

@WardLT

Description

@WardLT

Going from dimer up to a 20-sized cluster is a really large stretch. As @kmherman is finding, our agent has a tendency of learning policies that quickly add water molecules so that it gets to end of the episode (where it receives the reward) quickly. One route to avoiding this "rush to the finish" policy learning is to introduce rewards during the episode (#5). Another we could try is going in smaller steps.

A few different approaches could work for this:

  1. Pick a random cluster at the beginning of an episode, and add a fixed number of bonds or fixed number of waters for the episode.
    • Challenge 1: How do we reward the end of the episode?
      • Just use the same reward function?
    • Challenge 2: Where do we get the structures to start with?
      • Option 1: Keep a priority queue of best structures for each size of water. Pick one at random for the beginning of the episode
        • Best structures could be those with low energies or those with high rewards based on the policy's value function
      • Option 2: Generate random graphs and alter from there.
      • Option 3: Use the known ground states from existing studies, do not alter during the RL learning
        • Problem is that no tracks to go from ground-states at small sizes to larger ones without bond removal (our RL

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions