PPO and Q-Learning

I have reeding about these two algorithms, and I did understand that PPO works with continous actions and continous observations. So if my agent uses discrete values is not using PPO? Is there a way to use Q-Learning in Ml-agents?

PPO also works with discrete action space and discrete observation space. Currently Q-learning is not implemented in ML-agents

1 Like