Hi,
I am training an environment similar to soccerTwo and having the problem with mean reward always staying at zero while ELO decreases constantly. Is this a normal behavior? The reward strategy is simply the same as in soccer example (i.e. addgroupreward(1/-1) ) however, my training results in mean reward zero and meangroupreward a very small fraction (below 0.1) even after 10M iterations.