Hey there and welcome to the forum,
curious question indeed. I spent some thoughts on this an i am not really sure if this is answerable unless someone already did this (highly doubt that as there are only few questions regarding ML-Agents on here)
So i’ll just leav you my thoughts on this:
I wonder a bit about why you want to think about taking this approach. Let’s assume we have 1 Model that takes in observations, makes a decision and executes an Action that has lets say an action set [A1, A2, A3, B1, B2, B3, C1, C2, C3]. In this Action Set (let’s call it A-Set) we have 9 different actions but there are always 3 Actions which are similar to each other - this should simulate that we have one “baseaction” where we can decide to take one of 3 similar actions to get the best performance.
This is now out baseline.
Now if this approach works is not determinable without more knowledge about the problem that we have. It also assumes that the “similar actions” are even definable in such a way.
Not let us assume we split this up. There is now one Model M1. Which takes observations and decides on the Action-Set [A, B, C]. Now depending on the action chosen you want to take a different Model to find the best possible way to execute the action. The problem i see here is that this second model also needs an observation space. Now you either have to 1) include the Action chose by model 1 into the observation space or 2) must train 1 model per chooseable action from model 1.
The first one does not really seem viable to me. In my experience models do not have an easy time to learn completly different behaviour based on the switch of one number in the observation.
The second one would imply that you have to train 1 model per set of “subactions”. I am not sure if that is a good idea, especially since a change in observation would require a retraining for all models that you have.
So in the end there is some pro/cons to all options. Putting it all in one model surely “bloats” the model. It is a lot to learn and has a lot of actions to choose from, but i think you still could get the most consistent behaviour over all actions with this.
The split could lead to better behaviour for certain actions and could improve learning times as the model does not have to learn so much behaviour in one go. At the same time - as i said, changes to the environment could be more difficult to handle here.
If either of the options improves the “debuggability” cannot really be said i think.
Either way i’d suggest the following: start with a reduces set of actions with one model. Try if this works. Add one more actions, see how learning times and performance in general changes. Perhaps additive learning is a good idea here? Let it learn action A1, A2, A3 first → then go to actions B in another session - i think this was called “lessons” or something in ML-Agents.
Then i’d also suggest to share some more info on your project. What are the observations? What is the agents supposed to learn? What are the actions? Is my assumption of how the "choosing of the best way to do Action X correct?
So far most of the poeple i talked with about ML-Agents often lost out of sight that choosing the correct observations can be more crucial than anything else.
Let me know what you think, I hope that helped in any way.