parameters of GridWorld

i try to train the GridWorld
i set girdsize to ‘15’,and the area has 9 pits,1agent and 1goal .
i set if the agent get pits ,it would get reward ‘-2’;if it get goal,it would get reward ‘3’.Also the reward of each step is ‘-0.001’,and the max step of agent is ‘1000’.
7009130--829070--upload_2021-4-6_9-5-37.png
Other parameters are set as follows:
7009130--829061--upload_2021-4-6_9-2-11.png
but i found the reward of each train is bad ,There was no trend in reward,as follows:



i want to konw how to set these parameters ???

Please check out our doc on training configuration file which explains what each parameters means and general guidelines on how to adjust them.

i want use LSTM in this model,so what should i do?i have no experience to apply it