MeanReward for Self-Play POCA Learner

Hi,

I am training an environment similar to soccerTwo and having the problem with mean reward always staying at zero while ELO decreases constantly. Is this a normal behavior? The reward strategy is simply the same as in soccer example (i.e. addgroupreward(1/-1) ) however, my training results in mean reward zero and meangroupreward a very small fraction (below 0.1) even after 10M iterations.

Hey, Did you happen to find any solution yet. I am making a 4 player turn based card game team of 2 vs 2. While training what is happening is my ElO increases with +ve mean group reward but at 200000 step, when there is team swap, it starts decreasing rapidly with -ve mean group reward.