I think you’re trying to micro-manage your agent by rewarding behaviours, rather than achievements. The idea is to set a goal and let the agent figure out how to get there. Given a large enough time_horizon, the agent should be able to cope with late rewards and infer which past actions were required to achieve them. Rewarding behaviour imo is kind of like introducing a hidden heuristic, because you’re telling the agent how to reach a goal, rather than what that goal actually is. What’s your runners goal? Is it running speed? Is it reaching waypoints along the track? Or is it getting points by jumping through hoops? Are the hoops obstacles, or can the agent still achieve its goal by ignoring them?
The agent should probably request observations and actions at some regular interval, maybe even at every step. If you’re limiting the decision window to points that you deem critical, then again, you’re telling the agent how to do its job. Make sure the observations are in the agent’s local space and normalize them. The runner probably doesn’t need to know about its own position, only about the relative positions of objects in its vicinity.