Associating rewards with past actions

I’m trying to train an agent to shoot slow moving bullets at a moving target. The action is the angle that the bullet shoots at. The problem is that the agent might fire several bullets before the first bullet even reaches the target. So, if the first bullet hits the target and creates a reward, won’t that reward be associated with the later actions (which might be in the opposite direction of the target).

Essentially, my question is this: is there any way to associate rewards with specific actions that occurred in the past? This way, I could associate the bullet hit reward with the exact action that represents the firing of that specific bullet (thus creating an accurate action/reward tuple). If not, what would be the best way of training for this specific situation (slow moving bullets towards a moving target)?

So I used to think that lambda effected this but I don’t think it’s so simple and I would suggest not touching it just yet. I also have seem some explanations of time_horizon which would make me think this should be at least as long as it takes from the moment they fire to when it will result in a reward. But I have also seen config files(like for the KartRacing) where the time_horizon is soo short, like 64, even though the time to get around the track is much longer than that, so that makes me think its more complicated than that. I’d probably start with a time horizon that is long enough to capture the point of fire to point of reward. I’d then probably mess with batch and buffer before messing with other hyperparamters. Maybe after that i’d mess with network size(hidden units and layers). warning: this may be incorrect info :stuck_out_tongue: