Problems with Python LLAPI

After reading the docs, I might understand why sometimes one agent terminates and decision_steps contains no agent. Perhaps it is because one agent terminates and other agents do not need the action at this time.

8855446--1207921--upload_2023-3-6_19-50-1.png
8855446--1207924--upload_2023-3-6_19-51-23.png

So I have another question. One agent terminates and no agent requires action at this time, but I have to call env.step() to step forward. Does this mean all agents take zeros actions after env.step()?

LLAPI is agent-centric, not episode-centric. Whenever an agent needs an action it asks you. You have to give a decision to that agent at that time.

Personally, I found that LLAPI might not be a good fit for multi-agent, which is why I wrote Peaceful Pie, to control unity directly from python, see my signature.

(Note that, in general, since your environment typically is episode centric, if one agent dies, and two are still alive, then you will be sent two decision step requests at the same time as the terminal step.

What I personally found an issue is that one cannot distinguish between the following scenarios:
- agent dies, episode ends, three agents spawn (1 terminal, 3 decision); vs
- agent dies, episode continues, new agent spawns, two existing agents continue (1 terminal, 3 decision)

Yes, you can sort of hack around this if you are familiar with the exact scenario, but it feels kind of ... hacky... to me.

[quote=“hughperkins”, post:3, topic: 911267]
(Note that, in general, since your environment typically is episode centric, if one agent dies, and two are still alive, then you will be sent two decision step requests at the same time as the terminal step.

What I personally found an issue is that one cannot distinguish between the following scenarios:

  • agent dies, episode ends, three agents spawn (1 terminal, 3 decision); vs
  • agent dies, episode continues, new agent spawns, two existing agents continue (1 terminal, 3 decision)

Yes, you can sort of hack around this if you are familiar with the exact scenario, but it feels kind of … hacky… to me.
[/quote]
Thanks a lot. I think I have understood what you said. But I hink my env actually cannot be called multi agent formally. It’s a 1v1 env, there is one agent controlled by neural network and another controlled by fixed action. I create many instances to speed up trajectory collecting.

You said “if one agent dies, and two are still alive, then you will be sent two decision step requests at the same time as the terminal step.” But in my test, the case often is one agent dies and two are still alive. At this time decision_steps contains no agent and terminal_steps contains one agent. So I pass the actions(shape like [0(agent_num), action_dim]) to env. My question is when there is no agent requires action and I call env.step() to step forward env, will agents take all zero actions or maintain the action from the last time?

[quote=“hughperkins”, post:2, topic: 911267]
LLAPI is agent-centric, not episode-centric. Whenever an agent needs an action it asks you. You have to give a decision to that agent at that time.

Personally, I found that LLAPI might not be a good fit for multi-agent, which is why I wrote Peaceful Pie, to control unity directly from python, see my signature.
[/quote]

And I have watched your videos on Youtube. It is convenient for exchanging data between Unity and python. I have several questions about Peaceful Pie. In your videos, there is only one agent in the scene, is it possible to create several instances in one scene to speed up trajectory collecting and training? And in your video, you use Editor to train, is it possible to train with published Unity scenes?

Thanks a lot.