how to use example encironments in python?

hello!
I am learning to use ML-Agents along with Pytorch to study RL algorithm.
I follow the google Colab tutorial “ML-Agents Q-Learning with GridWorld”, and the script works well.

However, when i connect this code to my Unity Hub, only changed the code
env = default_registry["GridWorld"].make()
to
env = UnityEnvironment(file_name=None)
it could work at first few steps, then an error occur.
like this:

GridWorld environment created.
Training step 1 reward -0.9999999776482582
Training step 2 reward -0.7777777603930898
Training step 3 reward -0.7777777579095628
Training step 4 reward -0.9999999776482582
Training step 5 reward -0.9999999776482582


KeyError Traceback (most recent call last)
in
34
35 for n in range(NUM_TRAINING_STEPS):
—> 36 new_exp,_ = Trainer.generate_trajectories(env, qnet, NUM_NEW_EXP, epsilon=0.1)
37 random.shuffle(experiences)
38 if len(experiences) > BUFFER_SIZE:

in generate_trajectories(env, q_net, buffer_size, epsilon)
54 # Create its last experience (is last because the Agent terminated)
55 last_experience = Experience(
—> 56 obs=dict_last_obs_from_agent[agent_id_terminated].copy(),
57 reward=terminal_steps[agent_id_terminated].reward,
58 done=not terminal_steps[agent_id_terminated].interrupted,

KeyError: 1

I find that at each step beginning, after env.reset() is called, env.get_steps() shouldn’t return anything, but when using the example environment GridWorld, it could get steps even reset.
I wonder it is because the example environment has a script make steps, but i cant find where to close it, make it a pure environment to be trained. Do anyone know how to use these example environment with python?

here are some of the codes:

from mlagents_envs.registry import default_registry
import matplotlib.pyplot as plt
from mlagents_envs.environment import UnityEnvironment
import time
%matplotlib inline

# Create the GridWorld Environment from the registry
#env = default_registry["GridWorld"].make()
env = UnityEnvironment(file_name=None)
print("GridWorld environment created.")

# Create a new Q-Network.
qnet = VisualQNetwork((64, 84, 3), 126, 5)

experiences: Buffer = []
optim = torch.optim.Adam(qnet.parameters(), lr= 0.001)

cumulative_rewards: List[float] = []

# The number of training steps that will be performed
NUM_TRAINING_STEPS = 70
# The number of experiences to collect per training step
NUM_NEW_EXP = 1000
# The maximum size of the Buffer
BUFFER_SIZE = 10000

for n in range(NUM_TRAINING_STEPS):
  new_exp,_ = Trainer.generate_trajectories(env, qnet, NUM_NEW_EXP, epsilon=0.1)
  random.shuffle(experiences)
  if len(experiences) > BUFFER_SIZE:
    experiences = experiences[:BUFFER_SIZE]
  experiences.extend(new_exp)
  Trainer.update_q_net(qnet, optim, experiences, 5)
  _, rewards = Trainer.generate_trajectories(env, qnet, 100, epsilon=0)
  cumulative_rewards.append(rewards)
  print("Training step ", n+1, "\treward ", rewards)
 


env.close()

# Show the training graph
plt.plot(range(NUM_TRAINING_STEPS), cumulative_rewards)

It is possible for an Agent to both be terminated and request a decision at the same time if it dies, then reset is called and then a request decision is called immediately.