The key issue is that rllib model and ml-agent have different inputs/outputs. For example, in the example script provided by rllib https://github.com/ray-project/ray/blob/master/rllib/examples/unity3d_env_local.py, if we train 3dball in torch, we could get a model which outputs a tensor with first half action and second half std(for random exploration), but ml-agent needs a onnx model outputs action, deterministic action and some other outputs. You should read the rllib code for more input/output info. Besides, I recommend using https://netron.app/ to view your onnx graph.
It’s possible to edit onnx model with python directly; see https://github.com/onnx/onnx/blob/main/docs/PythonAPIOverview.md and https://github.com/onnx/onnx/blob/main/docs/Operators.md. Here is my convert example for 3dball torch model trained with rllib example:
torchmodel = onnx.load('torchmodel.onnx') #the rllib output model dir
graph = torchmodel.graph
graph.input.pop() #remove an unused input
graph.input[0].name = 'obs_0' #rename input
graph.node[0].input[0] = 'obs_0'
#slice the first half array as true action
starts = onnx.helper.make_tensor("starts", onnx.TensorProto.INT64, [1], [0])
ends = onnx.helper.make_tensor("ends", onnx.TensorProto.INT64, [1], [2])
axes = onnx.helper.make_tensor("axes", onnx.TensorProto.INT64, [1], [-1])#the last dimention
graph.initializer.append(starts)
graph.initializer.append(ends)
graph.initializer.append(axes)
#some useless output in inference
version_number = onnx.helper.make_tensor("version_number", onnx.TensorProto.INT64, [1], [3])
memory_size = onnx.helper.make_tensor("memory_size", onnx.TensorProto.INT64, [1], [0])
continuous_actions = onnx.helper.make_tensor("continuous_actions", onnx.TensorProto.FLOAT, [2], [0,0])
continuous_action_output_shape = onnx.helper.make_tensor("continuous_action_output_shape", onnx.TensorProto.INT64, [1], [2])
graph.initializer.append(version_number)
graph.initializer.append(memory_size)
graph.initializer.append(continuous_actions)
graph.initializer.append(continuous_action_output_shape)
#add the slice node
node = onnx.helper.make_node(
'Slice',
inputs=['output', 'starts', 'ends','axes'],
outputs=['deterministic_continuous_actions'],
)
graph.node.append(node) # add node in the last layer
#clear old output and add new output
while len(graph.output):
graph.output.pop()
actions_info = onnx.helper.make_tensor_value_info("deterministic_continuous_actions", onnx.TensorProto.FLOAT, shape=[])
graph.output.append(actions_info)
version_number_info =onnx.helper.make_tensor_value_info("version_number", onnx.TensorProto.INT64, shape=[])
graph.output.append(version_number_info)
memory_size_info =onnx.helper.make_tensor_value_info("memory_size", onnx.TensorProto.INT64, shape=[])
graph.output.append(memory_size_info)
continuous_actions_info = onnx.helper.make_tensor_value_info("continuous_actions", onnx.TensorProto.FLOAT, shape=[])
graph.output.append(continuous_actions_info)
continuous_action_output_shape_info =onnx.helper.make_tensor_value_info("continuous_action_output_shape", onnx.TensorProto.INT64, shape=[])
graph.output.append(continuous_action_output_shape_info)
onnx.checker.check_model(torchmodel)
onnx.save(torchmodel, 'mlagentmodel.onnx') #save model dir; you can also check your model output in python with onnxruntime
The more elegant way is to get the torch/tf model, modify torch/tf model input/output and then save torch/tf model as onnx like ml-agent does https://github.com/Unity-Technologies/ml-agents/blob/main/ml-agents/mlagents/trainers/model_saver/torch_model_saver.py; or bypass ml-agent and use Barracuda to execute rllib onnx model. However, I didn’t find a proper way to get torch/tf model from rllib, and I have very little experience with C#… Appreciate it for anyone can help on this topic.