Our current water cluster environment builds up a cluster from a dimer with adding a water as one possible action. The episode proceeds until we have more than the total number of waters (N).
Instead, we could make a version where we start with maximum number of waters and proceed until we can no longer place new bonds.