-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
Going from dimer up to a 20-sized cluster is a really large stretch. As @kmherman is finding, our agent has a tendency of learning policies that quickly add water molecules so that it gets to end of the episode (where it receives the reward) quickly. One route to avoiding this "rush to the finish" policy learning is to introduce rewards during the episode (#5). Another we could try is going in smaller steps.
A few different approaches could work for this:
- Pick a random cluster at the beginning of an episode, and add a fixed number of bonds or fixed number of waters for the episode.
- Challenge 1: How do we reward the end of the episode?
- Just use the same reward function?
- Challenge 2: Where do we get the structures to start with?
- Option 1: Keep a priority queue of best structures for each size of water. Pick one at random for the beginning of the episode
- Best structures could be those with low energies or those with high rewards based on the policy's value function
- Option 2: Generate random graphs and alter from there.
- Option 3: Use the known ground states from existing studies, do not alter during the RL learning
- Problem is that no tracks to go from ground-states at small sizes to larger ones without bond removal (our RL
- Option 1: Keep a priority queue of best structures for each size of water. Pick one at random for the beginning of the episode
- Challenge 1: How do we reward the end of the episode?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels