Why am I getting the same int numbers instead of random float ones on the ContinuousActions vectors?

Hello everyone, I was wondering if anyone has had this situation… I am trying to do some training with the MLAgents Unity package and I went through all of the guides. I set up a simple scenario in which I am requesting 3 values of Continuous type and no Discrete ones, as you can see here:

But for some reason that I don’t understand, instead of getting ‘random’ values between -1 and 1, I keep getting 0, 1, and -1, 99% of the time, from the very beginning the game starts. Here are my logs on the console (please note that the collapse option is enabled, so you can see the number of times each number is being repeated) and the snipped on the code I’m using for this simple test:


Yes, I am getting some random values on the first vector (0) of the ActionBuffers, however, it only happens sometimes and for my particular scenario, it doesn’t really help me as the agent would basically never get to the point where it can collect a positive reward.

I’ve been trying to figure this one out for hours and haven’t been able to figure out why… I did another test with a video I found on youtube, trying to get a cube to move toward a specific position, for this other test I needed 2 values, and for this one, every single time, I am indeed getting different ‘random’ values, as I should expect… However, I am not doing anything differently! So how is this case different from the other one? Just for reference, this is how my other test (where I was simply trying out the library) looks:

And as I mentioned before, this one does return values between -1 and 1, and therefore the agent can actually learn how to get to the reward.

It got to a point in which I even considered my PC was somehow faulty or damaged, so I formatted my whole system and did some maintenance on it, but it keeps giving me the same single 0, 1, and -1 values all the time. I also created a build and tried out that method, same exact results. I then tried the build on a different PC, same exact results!

I would highly appreciate anyone’s help! Honestly, at this point, I don’t really know what else to try… It just is not making any sense to me :frowning:

1 Like

Is this happening after training for some steps or is it immediately outputting the whole numbers?

My initial guess would be some un-normalized observations blowing up the decision space.

It happens immediately. Your guess is actually right! Although I still don’t understand why… I was able to track the issue back to the information vectors, I’m currently sending 5 vectors, the object’s position and 2 velocity values which are 50 and 300, and if I put 0.005 and 0.3 for the other 2 vectors instead of the whole values, it actually works. Which doesn’t make any sense to me because I don’t think it is specified on the documentation that these values have to be normalized. And even if that’s the case, why do the position values not break this rule then? Because there I have values way bigger than 1, but with those values it does work, it seems to only start failing when there are values bigger than 1 on the fourth vector and onwards.

Nice, glad you could get it working!

Check out the docs here - ml-agents/docs/Learning-Environment-Design-Agents.md at develop · Unity-Technologies/ml-agents · GitHub to read a bit about it.

An easy intuition for this is (inaccurately) viewing the policy model as a signal multiplier, if your observation signal is 300 how many times can it be randomly multiplied before one of the float values goes full NaN on you? With a huge range of input values the network outputs a huge range of outputs which are then clamped to [-1,1] meaning you will always get -1 or 1. My guess is the 0’s are NaNs.