How do I setup behavior vector action if the agent for example, have dynamic amount of limbs, where each limbs is controlled by 1 branch (Muscle Relax/Contract) (0-1f)?
I need the agent to coordinate between all the available limbs to complete a task.
You’ll need to commit to a fixed number of actions and observations, at least for now. If you want to generalize training for a variable number of limbs, you can define a large enough action space for the maximum number of limbs and give the agent some feedback about which limbs are present. Could be a simple one-hot encoded observation per optional limb. Your code would just ignore action values for non-existing limbs.