Loading ML-Agents Trained Froze Graph Into Tensorflow

Hi, I am trying to load the ML-Agents trained neural network back into python to run inference. I guess the Tensorflow Froze Graph is the best approach? However, I am running into some difficulty, since I am not quite sure what some of the items are.

By experiment, here are the input and output layers. I am not exactly sure what action_masks is, but Tensorflow keep giving me an error without including it.

input0 = graph.get_tensor_by_name('prefix/vector_observation:0')
intput1 = graph.get_tensor_by_name('prefix/action_masks:0')

output = graph.get_tensor_by_name('prefix/action:0')


best photo sharing websites

My original neural network has 52 observations and the output has 2 branches with 3 possibilities each.

Input = [52 items]
Output = [ 0, 1 or 2 ] ; [ 0, 1 or 2 ] = [out1] ; [out2]

Vertical Movement = [0 , 1 , 2]       0 - no action   1 - forward      2-    backward
Horizontal Movement = [0 , 1 , 2]     0 - no action   1 - turn left    2-  turn right

This is the code I tried with running inference… not sure how to make sense of the action mask and output

with tf.Session(graph=graph) as sess:
    y_out = sess.run(y, feed_dict={
            x: [[0, 0, 30, 50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50, False, False, False, False, False, False,True,False,False,False, True, False, False]],
            x1:[[0,1,2,0,1,2]]
        })
array([[-1.6118095e+01, -1.5712630e+01,  1.1920928e-07, -1.6118095e+01,
       -1.7318338e-02, -4.0646248e+00]], dtype=float32)

Any idea how to make this work? Thanks!

Hi.
Running trained models outside of Unity is not a supported feature. These models are made to work with the Unity Inference Engine which is why they are hard to read from Python.
Action masks are for masking some of the actions, The values can be zeros or ones. If you don’t want to mask anything, make this tensor all zeros (or all ones I do not remember).
Regarding the actions, for legacy reasons, the output corresponds to the log probabilities of the actions for each branches. Since you have 2 actions with 3 possibilities, the 3 first numbers correspond to the log probabilities of the first branch and the last 3 to the second.
To sample from them, you need to exponentiate them and sample using a multinomial distribution. For example:

 array([[-1.6118095e+01, -1.5712630e+01,  1.1920928e-07, -1.6118095e+01, -1.7318338e-02, -4.0646248e+00]], dtype=float32) [\code]

means that the first action has logits : -1.6118095e+01, -1.5712630e+01, 1.1920928e-07
Note that exp(-1.6118095e+01)+ exp(-1.5712630e+01) + exp( 1.1920928e-07) = 1
And the second logits -1.6118095e+01, -1.7318338e-02, -4.0646248e+00
Also note that exp( -1.6118095e+01)+ exp(-1.7318338e-02) + exp( -4.0646248e+00) = 1

If we look only at the first branch, -1.6118095e+01, -1.5712630e+01, 1.1920928e-07
after exponential becomes 1.00000065e-7, 1.50000081e-7 and 0.99999988079, so the for the first branch, the selected action is 2 (with very high certaincy).