I’m learning about how ppo reinforcement learning works in training model. Specific is the HummingBird example from ImmersiveLimit from this link :
Now i’m having some question, hope that someone can help me with these :
How does the 3 script Flower, FlowerArea and HummingBirdAgent transfer data ? (input, output, processing data)
How does the ppo reinforcement learning apply to this example ?
After training with pytorch, I cant use my new .onxx files for agent, it said “is_continous_action” is missing, which not happen when the original .nn was used ?
What version of ML-Agents package and Python are you using? You might be on an incompatible version combination - you can see the compatible versions on the main ML-Agents page.
Yeah if you have the latest mlagents, you should use the latest C# package. There was a big version jump with the 2.0.0-pre3 package which makes it incompatible with prior versions of the Python code.
thank you, i have another question : To build up a new model, you must have a dataset for the training . But i dont see any code lines or part of this learning process anywhere in unity (HummingBirdAgent script), or is it in Python ?
I have done some diggin at mlagents API python package but havent found anything yet.
can you show me the code that generate the dataset ? I’m desperate looking for it to finish my final exam. My teacher keep asking about it but i cant find it anywhere
i 've seen so many videos about ml tutorial and understand that there is no “dataset”, but according to my teacher there must be somewhere in python API use things it collected (observations, rewards) to make the choice of many actions for agent. I haven’t found the code line show that transfering process (observations, rewards → actions) anywhere in python. I think it’s kinda hard
are you trying to work from unity directly (C#?), maybe you havent looked at the python API (not gym wrapper).
look at the docs > collab examples > example 01:
for episode in range(3):
env.reset()
decision_steps, terminal_steps = env.get_steps(behavior_name)
tracked_agent = -1 # -1 indicates not yet tracking
done = False # For the tracked_agent
episode_rewards = 0 # For the tracked_agent
while not done:
# Track the first agent we see if not tracking
# Note : len(decision_steps) = [number of agents that requested a decision]
if tracked_agent == -1 and len(decision_steps) >= 1:
tracked_agent = decision_steps.agent_id[0]
# Generate an action for all agents
action = spec.action_spec.random_action(len(decision_steps))
# Set the actions
env.set_actions(behavior_name, action)
# Move the simulation forward
env.step()
# Get the new simulation results
decision_steps, terminal_steps = env.get_steps(behavior_name)
if tracked_agent in decision_steps: # The agent requested a decision
episode_rewards += decision_steps[tracked_agent].reward
if tracked_agent in terminal_steps: # The agent terminated its episode
episode_rewards += terminal_steps[tracked_agent].reward
done = True
print(f"Total rewards for episode {episode} is {episode_rewards}")
here you see the actions are being chosen random and fed into the agent, at env.set_actions()
and at each step you can check the observations through.
decision_steps.obs```
for a more complicated example, thats example 02 with a custom neural network (i am also stuck/or slowly advancing at this step, i am currently trying to modify the custom network to output multiple outputs and train my own environment). let me know if this helped.
thank you, i did look in the code of python API but in trainer.py and optimizer.py but i didn’t check the file you mention. (directory : ml-agents\mlagents\trainers\ppo)