I’ve been searching for a way to build my own unity environment as a gym environment to run my python code on cloud notebook(Kaggle/collab)
I’ve found several links referring to unity gym wrappers, however, all the links are broken (even some recent ones from 2 months ago), did they drop the feature or change the documentation links?
- The low level api is documented here: ml-agents/docs/Python-LLAPI.md at develop · Unity-Technologies/ml-agents · GitHub
- The gym env is documented here: ml-agents/docs/Python-Gym-API.md at develop · Unity-Technologies/ml-agents · GitHub
However, I found them not flexible enough for my own purposes (LLAPI doesn’t tell you when episodes finish, and it locks the Unity Editor in between calls to step
, for example).
So I wrote my own approach, see
This is precisely what I was looking for, thanks! you have a new subscriber
Hello, following up on this - I downloaded the ml-agents release 21 and the latest stable-baselines3 available (using gymnasium). However, this is causing an issue with the UnityToGymWrapper as sb3 is expecting gymnasium.spaces.box.Box while the Wrapper provides gym.spaces.box.Box. I tried the following:
import gymnasium as gym
However, it’s not doing the trick. I had to downgrade sb3 to 1.8.0, which is the latest version supporting gym, but I’d like to transition to newer versions since there’s no longer support for gym.
This is the code that I’m using to train the agent:
import gym
from mlagents_envs.environment import UnityEnvironment
from mlagents_envs.side_channel.engine_configuration_channel import EngineConfigurationChannel
from mlagents_envs.envs.unity_gym_env import UnityToGymWrapper
import os
import warnings
warnings.filterwarnings('ignore')
import sys
import numpy as np
import time
import argparse
from stable_baselines3.common.monitor import Monitor
from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize
from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.common.env_util import make_vec_env # creation of parallel environments
from stable_baselines3.common.logger import configure
from stable_baselines3.common.noise import NormalActionNoise, OrnsteinUhlenbeckActionNoise
from stable_baselines3.common.callbacks import BaseCallback
from utils import ALGOS, SaveOnBestTrainingRewardCallback, linear_schedule
import wandb
np.random.seed(2)
def main(args):
"""
:param args: (ArgumentParser) the input arguments
"""
algo = args.algo
model_class = ALGOS[algo]
save_dir = os.path.join(os.path.dirname(__file__), '../results')
res_dir = os.path.join(save_dir, args.res_dir)
model_dir = os.path.join(res_dir, args.model_dir)
logs_dir = os.path.join(save_dir, args.tensorboard_log)
logger_dir = os.path.join(logs_dir, args.model_dir)
if args.pretrained == 'True':
model_dir_pretrain = os.path.join(res_dir, args.model_dir_pretrain)
os.makedirs(res_dir, exist_ok=True)
os.makedirs(logs_dir, exist_ok=True)
os.makedirs(logger_dir, exist_ok=True)
channel = EngineConfigurationChannel()
env = UnityEnvironment(None, side_channels=[channel])
# env = UnityEnvironment('built_scenes/UnityVolumeRendering', side_channels=[channel], base_port=5004)
channel.set_configuration_parameters(time_scale=4)
env = UnityToGymWrapper(env, uint8_visual=False, flatten_branched=False, allow_multiple_obs=False)
env.reset()
env = Monitor(env, logger_dir, allow_early_resets=True)
# env = VecNormalize(env, norm_obs=True, norm_reward=True, clip_obs=10.)
env = DummyVecEnv([lambda: env])
logger = configure(logger_dir, ["stdout", "csv", "log", "tensorboard"])
wandb.init(
# set the wandb project where this run will be logged
project="AgentTransl",
name=args.model_dir,
)
# Setting the policy to “MlpPolicy” means that we are giving a state vector as input to our model.
# There are only other two options:
# - CnnPolicy, of you provide images as input;
# - MultiInputPolicy, for handling multiple inputs
if algo == 'ppo':
print(f'RL Algorithm: {model_class}')
if args.pretrained == 'False':
model = model_class("MlpPolicy", env, verbose=1)
model.set_logger(logger)
print('training')
callback = SaveOnBestTrainingRewardCallback(check_freq=1000, log_dir=logger_dir)
model.learn(total_timesteps=args.n_train_timesteps, callback=callback)
model.save(model_dir)
print('model saved')
del model
print('model deleted')
else:
model = model_class.load(model_dir_pretrain, env=env, verbose=1, seed=0)
model.set_logger(logger)
print('fine tuning model')
model.learn(total_timesteps=args.n_train_timesteps, tb_log_name=args.model_dir)
model.save(model_dir)
print('new model saved')
del model
print('model deleted')
elif algo == 'td3':
print(f'RL Algorithm: {model_class}')
n_actions = env.action_space.shape[-1]
action_noise = NormalActionNoise(mean=np.zeros(n_actions), sigma=0.1 * np.ones(n_actions))
if args.pretrained == 'False':
model = model_class("MlpPolicy", env, action_noise=action_noise, verbose=1, tensorboard_log=logs_dir, seed=0)
model.set_logger(logger)
print('training')
model.learn(total_timesteps=args.n_train_timesteps, tb_log_name=args.model_dir, log_interval = 10)
model.save(model_dir)
print('model saved')
del model
print('model deleted')
else:
model = model_class.load(model_dir_pretrain, env=env, verbose=1, seed=0)
model.set_logger(logger)
print('fine tuning model')
model.learn(total_timesteps=args.n_train_timesteps, tb_log_name=args.model_dir, log_interval = 10)
model.save(model_dir)
print('new model saved')
del model
print('model deleted')
# print score of the model
env.close()
print('training completed')
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Train agent on custom env')
parser.add_argument('--algo', default='ppo', type=str, required=False, choices=list(ALGOS.keys()), help='RL Algorithm')
parser.add_argument('--res_dir', type=str, help='Directory to save results')
parser.add_argument('--model_dir', type=str, help='Directory to save model.zip')
parser.add_argument('--policy', default='MlpPolicy')
parser.add_argument('--tensorboard_log', type=str, help='Tensorboard log dir')
parser.add_argument('--monitor', type=str, help='Monitor wrapper filename')
parser.add_argument('--n_train_timesteps', default=200000, required=False, type=int, help='Maximum number of timesteps for training')
parser.add_argument('--pretrained', type=str, default='False', required=False, help='Boolean to determine if the training must started from an existing model')
parser.add_argument('--model_dir_pretrain', type=str, required=False, help='Directory to load the pretained model')
args = parser.parse_args()
main(args)
Please, note I’m using
env = UnityEnvironment(None, side_channels=[channel])
because with release 19, I was getting the following to start the training (this is not happening now with release 21)
[INFO] Listening on port 5004. Start training by pressing the Play button in the Unity Editor.
If anyone could help me understand what is causing these issues or has encountered these before I would REALLY appreciate the help! Also, please let me know if you need further details.
It seems like you might be encountering broken links or outdated information. Unity provides a framework called “ml-agents” (Machine Learning Agents) that enables integration with Unity environments for reinforcement learning. You can check the official GitHub repository for the latest documentation and resources:
GitHub Repository: ML-Agents
Ensure that you are referring to the latest documentation and follow the instructions there to set up your Unity environment as a gym environment for reinforcement learning in Python. If you encounter specific issues, the GitHub repository’s issue tracker can be a helpful resource for seeking assistance or reporting problems.
Hello @petroben thank you for your reply! I’m already using ML-Agents and the mlagents-env, but I’m not interested in using the ML-Agents implementations. With release 19 of ML-Agents I’m able to integrate stable-baselines3 implementation of RL algorithm. The main issue is that new releases of stable-baselines3 do not longer support gym that has been replaced by gymnasium and this is causing an issue with the new release of ML-Agents.
You can look at my repository, there is a Kaggle notebook with full training, visualization of progress and conversion to onnx:
https://github.com/denisgriaznov/ReinforcementLearningSpyderWithUnityMLAgents
Hello, I am currently learning how to build my own reinforcement learning environment and hope to use my own Python reinforcement learning algorithm combined with Unity mlagents for training. May I ask how to learn or if there are any good learning resources recommended? Thank you very much