Obtain intrinsic curiosity reward (ICM) after training

Hi,

I’ve trained a Unity-ML agent using two rewards: extrinsic reward, and intrinsic curiosity reward (ICM model, docs). My question: how can I extract both rewards, per step, after training? In other words, if I now run the trained agent in my environment, how can I log what intrinsic and extrinsic reward they obtain with each action?

Edit: I came across the “DecisionStep” object in the documentation, but that only seems to return a single scalar as “reward”, where I’d need separate scalars for the extrinsic and intrinsic reward.

Many thanks for your help!
Christian

Rewards are only active during training. They serve no purpose whatsoever after training. Curiosity Module is a trainer and it does not exist in the trained model.

The barracuda inference model after training does not need / use rewards. Absolutely no harm in keeping the reward systems and giving the trained agents rewards. But the rewards are ignored.

Note: If you want to see the values after training you could run run ML agents in inference mode using the --resume and --inference flags.

Many thanks @ChillX ! Three follow-up questions:

  1. Is Barracuda the standard inference model or would we have to substitute the standard model with Barracuda in order to access them with --resume / --inference flags?
  2. If the curiosity module is only available during training, does the “reward” field in DecisionStep only represent the extrinsic reward?
  3. Would an alternative to querying the reward be to set the learning rate to 0 and pretend that the agent is still training?

Many thanks!

  1. When training PyTorch is the neural network engine. When running a trained model in Unity (for inference) the neural network engine is Barracuda
    The NN model is created by the ML-Agents Pytorch based implementation. Once the OMNX model exported by the Python side trainer is deployed in unity for inference it is then executed by the Barracuda engine (Developed by Unity).

Barracuda is what enables self contained cross platform deployment of the unity simulation / game without having to bundle in pytorch and tons of python dependencies.

  1. The agent reward assigned in each decision step is the Extrinsic Reward. Curiosity is an internal mechanism in the Python side trainer which provides its own reward scheme based on its observations of the Extrinsic reward that you give versus the values predicted by the model.

  2. Yes that would work as well. Actually setting the learning rate to 0 would work better than what I suggested.

Note: there is no / limited value in trying to extract the outputs of the curiosity module.
Curiosity module is basically a reward scheme that keeps looking for discrepancies between actual value of game state and predicted value of game state. Then whenever it finds a discrepancy it creates a reward to encourage the model to explore this discrepancy.

Kind of like: Hey I thought when I kick the ball it would go straight forwards but instead it went left. Agent model now you go off and investigate why it went left regardless of whether that is beneficial or not I (curiosity) would like you (agent) to explore this path just in case there is a future reward by exploring this path.

Thanks @ChillX ! I understand that this would typically be of little use, but we’re dealing with a research project here that investigates further uses of this reward. :slight_smile: I’m now facing the next challenge: how to obtain per-step reward during training, rather than accumulated reward over an episode? Log per-step intrinsic curiosity reward (ICM) during training