i try to train the GridWorld
i set girdsize to ‘15’,and the area has 9 pits,1agent and 1goal .
i set if the agent get pits ,it would get reward ‘-2’;if it get goal,it would get reward ‘3’.Also the reward of each step is ‘-0.001’,and the max step of agent is ‘1000’.
Other parameters are set as follows:
but i found the reward of each train is bad ,There was no trend in reward,as follows:
i want to konw how to set these parameters ???