Really cool! But right as it was nearing 4,000, it seems to have corrupted itself and no longer got any scores above 0. Not sure if that's a code bug or a neural net issue.<p>avg500 -4.6 last 500 episodes<p>peak 3959.3 best window<p>roll/s 20.68 20-step avg<p>progress 4388 562749 episodes
Cool project!<p>I noticed that if you go from training to watch and then back, the training temporarily drop significantly in score.
My average eventually made it to about 3900, and then stagnated between 3600-3900. I'm curious if this is universal behavior or not. I'm up to about 5k steps.
More details and implementation notes please?
cool project
[dead]