Different training result when project upgraded to new ML-Agents version

I have used ML-Agents v0.6 for quite some time for my research. Recently, I want to use some functionalities of the newer version so I have tried v0.10 for the old project. The problem is, after adjusting various parameters in the configuration file as in ml-agents/docs/Migrating.md at main · Unity-Technologies/ml-agents · GitHub, the result is, well, just slightly different. Imagine that with the 0.6 version, the output of a Vector Action element ranges from 0.04 to 0.06, for example; the output of that using the 0.10 version is 0.02 to 0.08.

Below is my config.yaml files
version 0.6

trainer: ppo
batch_size: 20480
beta: 5.0e-3
buffer_size: 204800
epsilon: 0.1
gamma: 0.6
hidden_units: 512
lambd: 0.9
learning_rate: 1.5e-3
max_steps: 3e6
memory_size: 256
normalize: false
num_epoch: 3
num_layers: 3
time_horizon: 3000
sequence_length: 64
summary_freq: 1000
use_recurrent: false 
use_curiosity: false
curiosity_strength: 0.01
curiosity_enc_size: 128```
version 0.10
```default:
trainer: ppo
batch_size: 20480
beta: 5.0e-3
buffer_size: 204800
epsilon: 0.1
hidden_units: 512
lambd: 0.9
learning_rate: 1.5e-3
learning_rate_schedule: linear
max_steps: 3e6
memory_size: 256
normalize: false
num_epoch: 3
num_layers: 3
time_horizon: 3000
sequence_length: 64
summary_freq: 1000
use_recurrent: false
vis_encode_type: simple
reward_signals:
extrinsic:
strength: 1.0
gamma: 0.6```

I am not sure what I'm doing wrong. Below is the TensorBoard for my two training, which looks really similar to me. The orange one is v0.6 and the blue one is v0.10

![](https://i.imgur.com/eZ3EmA8.png) 

Thank you in advance!

Hey @TnTonly - could you also provide us with your console logs, python version, & C# version? We’ll get this forwarded to the team for review.

I don’t know where to find the console logs. As for Python and C#, my Python version is Python 3.6.9 (Anaconda) and .NET version is 4.8.

Hi TnTonly, those training runs looks pretty much the same. It’s expected that they’ll be slight differences in the models - training is a stochastic operation, after all. Even rerunning the training with the same version of ML-Agents will produce a slightly different result! But it seems like your reward curve is pretty similar, so the performance of the agent shouldn’t be all that different.