On masking actions

Is it better to mask an action to prevent an agent from doing it or teach it not do it through negative rewards?

I usually depends on the situation. Sometimes a move is just impossible (for example in a match 3 environment). Attempting a wrong move will bring the agent back to the same state again at it will usually try the same wrong move again (since it is starting from the same state). In this case, I recommend masking that action.
Using penalties is tricky because the agent might become too scared to use the move after some training, so you will need to find the "ideal" penalty amount. Penalties are useful in locomotion tasks for example, because they introduce energy constraints and can force the agent "not to move too much".

1 Like

Thank you @vincentpierre !