Hi,
I am attempting to develop an Asteroids type Base Defence simulation agent. So the Actions are to a) Rotate the Gun Clockwise/ anti clockwise and another Action b) Fire the Gun/Defence Missile. To hit the incoming rocks/missiles, before they hit my base. But my Agents expend lots of shots/missiles, which I would like to influence to minimise the shots fired. Hopefully to learn that only one shot is required per incoming Missile. But I am struggling to get a Reward scenario to achieve this behavior.
So my reward profile:
Destroy Enemy Missile: Reward: +1.0
Enemy Missile Reaches my Base: Reward: -1.0
Missile/ Shot Fired: Reward: -0.1
With an Episode being my Base being subject to 20 Enemy Missiles, so the optimun would be only 20 Fires off to engage. But typicaly I send out ~5 shots per every incoming enemy missile.
I note that my Decision Request rate Defaults at 5. Which typically results in batches of 5 shots being fired. But when I reduce Decision Request rate down to 2 or 1, I get fewer friendly shots fired, but little to no training performance to engage many/most enemy missiles.
Increasing the Missile/ Shot Fired penalty to -0.5 I get no Training Convergence/ Performance.
Does anyone have similar Training Scenario, and Rewards profiling advice for Base Defence scenarios ?
Any Advice appreciated.