In the code,I see the code and note on rl_rainer.py on 133 row
with hierarchical_timer(“process_trajectory”):
for traj_queue in self.trajectory_queues:
We grab at most the maximum length of the queue.
This ensures that even if the queue is being filled faster than it is
being emptied, the trajectories in the queue are on-policy.
_queried = False
for _ in range(traj_queue.qsize()):
_queried = True
try:
t = traj_queue.get_nowait()
self._process_trajectory(t)
except AgentManagerQueue.Empty:
break
I can’t understand why the queue is on policy. The later trajectory when the policy hasn’t been update seems to use the old policy.