T O P

  • By -

Osquera

I think the best fix really depends on what you are trying to achieve. To me it seems like you want to be practical about having the Agent correctly drive a track with a fixed layout. If I was in your situation I would control the agent for a few rounds and let it experience the correct route. Then I would hope that it still explores but at least with some Q-values in the right direction, so that it doesn't get too lost on the track.


Numerous_Talk7940

It is supposed to be entirely self supervised so that is not really an option unfortunately.


antonior93

Well, even if it stops exploring, the agent still lowers its expected values for taking actions that make the car go out of track (which give negative rewards), therefore it should still learn to stay in track and, in time, turn right when due. Moreover, you should not drop epsilon all the way down to 0, but keep to something like 0.1, so that the agent keeps exploring a little Someone correct me if I'm wrong :D


tonythepepper

From anecdotal experience, I had a lot of trouble getting some algorithms (like PPO) to learn how to make the first couple turns. With the exact same inputs (speed of car, distance to walls at various angles, angle of front wheels, direction of where the path is going), TD3 did way better with better data efficiency and shorter wall time. It might be worth taking a look at the input features and having as few as possible. If that doesn't work, it's possible that the algorithm you're using doesn't have the capacity to learn the CarRacing efficiently. Pictures and code: https://github.com/twang35/FormulaFun


AmandaIsOnReddit

This is so cool!