How best to control which states are expanded in the planner ?

Hello !

I am having trouble making the planner search breadth-first. In my example, only 2 of the possible 5 states at each level have completed their subplan. This seems to be motivated by the highest estimated reward, however it pains me to know that if he would just expand the first level resulting states, he would find an action with a big immediate reward.

This also pushes the planner to have a depth of 1000+, which is useless in my case.

So far I tried :

  • Balancing out the immediate rewards to prevent the cumulated ones from hulking up.

  • Changing the DC plan settings : upping state expansion budget, capping plan size, setting the selection job to parallel instead of sequential.

  • Changing the execution settings.

Any (hopefully deterministic) advice ?

Cheers !


1 Like

What worked for me was to create a reward estimator with higher bounded values returned through the Estimate function. If I understand correctly, any branch with a reward outside of the min/max bounds of the reward estimator is discarded, so putting those bounds higher makes the planner evaluate more branches. But this was on an earlier version, so your milage may vary.

1 Like

Thank you @JvanOpstal ; by using the Default Cumulative Reward Estimator, with [-100, 0, 100] bounds, and keeping all my actions' rewards in the [-3, 3] range (instead of scaling them with relevant trait values), i was able to get breadth expansions.

I am, however, still hazy on the whole Bounded Value concept. From the VacuumRobot example project I had the impression that the Reward Estimator was giving a ballpark guess of what a state could produce, not actual bounds above which the planner stopped searching.

Any articles or wiki pages (besides the API) you know of that can clear it up ?

Thank you ! :)

I'm not sure if there's a good page for the planner about it yet, so I tried looking for other things and found this paper that I think describes the idea (I only skimmed it):


So, each branch's lower/upper bounds are pessimistic/optimistic predictions based on future states. And the reason my states (3, 4, 5) were not expanding was because their upper bound was lower than the states' (1, 2) lower bound, making them strictly worse.

And by adding estimated bounds much larger than the plan's cumulated reward, they were still worse but no longer strictly, so they were evaluated.

Cool, thanks ! :)