Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Are you aware of the generate trajectories (like 8 different plans), rank and then judge workflow from reinforcement learning?

I noticed it was giving me better results and allowed me greater variety even though I won't use the remaining plans.

https://gist.github.com/fire/17c4962827139822b3d2a96a0c479e4...

Note that the rule doesn't make much sense out of context and the math is wrong... oops :D





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: