hello!
I am learning to use ML-Agents along with Pytorch to study RL algorithm.
I follow the google Colab tutorial “ML-Agents Q-Learning with GridWorld”, and the script works well.
However, when i connect this code to my Unity Hub, only changed the code
env = default_registry["GridWorld"].make()
to
env = UnityEnvironment(file_name=None)
it could work at first few steps, then an error occur.
like this:
GridWorld environment created.
Training step 1 reward -0.9999999776482582
Training step 2 reward -0.7777777603930898
Training step 3 reward -0.7777777579095628
Training step 4 reward -0.9999999776482582
Training step 5 reward -0.9999999776482582
KeyError Traceback (most recent call last)
in
34
35 for n in range(NUM_TRAINING_STEPS):
—> 36 new_exp,_ = Trainer.generate_trajectories(env, qnet, NUM_NEW_EXP, epsilon=0.1)
37 random.shuffle(experiences)
38 if len(experiences) > BUFFER_SIZE:
in generate_trajectories(env, q_net, buffer_size, epsilon)
54 # Create its last experience (is last because the Agent terminated)
55 last_experience = Experience(
—> 56 obs=dict_last_obs_from_agent[agent_id_terminated].copy(),
57 reward=terminal_steps[agent_id_terminated].reward,
58 done=not terminal_steps[agent_id_terminated].interrupted,
KeyError: 1
I find that at each step beginning, after env.reset() is called, env.get_steps() shouldn’t return anything, but when using the example environment GridWorld, it could get steps even reset.
I wonder it is because the example environment has a script make steps, but i cant find where to close it, make it a pure environment to be trained. Do anyone know how to use these example environment with python?
here are some of the codes:
from mlagents_envs.registry import default_registry
import matplotlib.pyplot as plt
from mlagents_envs.environment import UnityEnvironment
import time
%matplotlib inline
# Create the GridWorld Environment from the registry
#env = default_registry["GridWorld"].make()
env = UnityEnvironment(file_name=None)
print("GridWorld environment created.")
# Create a new Q-Network.
qnet = VisualQNetwork((64, 84, 3), 126, 5)
experiences: Buffer = []
optim = torch.optim.Adam(qnet.parameters(), lr= 0.001)
cumulative_rewards: List[float] = []
# The number of training steps that will be performed
NUM_TRAINING_STEPS = 70
# The number of experiences to collect per training step
NUM_NEW_EXP = 1000
# The maximum size of the Buffer
BUFFER_SIZE = 10000
for n in range(NUM_TRAINING_STEPS):
new_exp,_ = Trainer.generate_trajectories(env, qnet, NUM_NEW_EXP, epsilon=0.1)
random.shuffle(experiences)
if len(experiences) > BUFFER_SIZE:
experiences = experiences[:BUFFER_SIZE]
experiences.extend(new_exp)
Trainer.update_q_net(qnet, optim, experiences, 5)
_, rewards = Trainer.generate_trajectories(env, qnet, 100, epsilon=0)
cumulative_rewards.append(rewards)
print("Training step ", n+1, "\treward ", rewards)
env.close()
# Show the training graph
plt.plot(range(NUM_TRAINING_STEPS), cumulative_rewards)