Sustainably chopping wood

Hey everyone,

I am trying to create an Agent that accesses the state of a forest and then decides wether to chop a tree or not. The general setup is that each tree in the forest spawns new trees with a certain probability, so over time the forest is fully stacked.

The Agent gets as Input the number of trees in the forest and how much wood it has generated as of yet.

It gets rewarded if it chops a tree and there are still trees remaining in the forest and it gets punished if it chops the last tree. (I also tried punishing it a little if it does not chop a tree in every step, but this yielded the same result.)

The behaviour I expect would be: That the agent chops wood if there is more than a single tree and that it does not if there is just one tree left.

However so far it always ends up in a policy, where it just does not want to chop at all. (Quite the druid…)

Any suggestions on how to proceed? Or is this generally a problem that can not be solved that well by reinforcement leraning?

Also here is my config file:

behaviors:
  Chopper:
    trainer_type: ppo
    hyperparameters:
      batch_size: 10
      buffer_size: 100
      learning_rate: 3.0e-4
      beta: 5.0e-4
      epsilon: 0.2
      lambd: 0.99
      num_epoch: 3
      learning_rate_schedule: linear
    network_settings:
      normalize: false
      hidden_units: 128
      num_layers: 3
    reward_signals:
      extrinsic:
        gamma: 0.99
        strength: 1.0
      curiosity:
        strength: 0.02
        gamma: 0.99
        encoding_size: 256
        learning_rate: 3.0e-4

    max_steps: 100000
    time_horizon: 1000
    summary_freq: 1000
    threaded: true

Any help is greatly appreciated!

Cheers

Hi. In my humble opinion, this is definitely a simple enough problem for the agent to be able to solve. However the situation is still a little vague. Could you please elaborate a little? What kind of observations are you providing the agent with? Are you sure that the agent is exposed to the objects it needs to interact with? Also what are the measurements for the reward system?

Hi!
Sorry for my lage respone.

Sensordata:

  • number of trees in the forest
  • average age of the trees (0 - 100)
  • amount of wood in storage

Actions:

  • Chop a tree (The oldest tree is picked automatically)
  • Don’t chop

Rewards:

  • If a tree is chopped get age/100 as reward (This means if an old tree is chopped it gets a reward of almost 1)
  • If the last tree is chopped get a reward of -1

Based on your Sensor data, I am still unclear as to whether or not the agent has any idea that the forest even exists at a certain location. if possible, can you share your project so I can take a look?

No the agent is no physical entity. Its an abstract “decision maker”. At least that is what I want it to be. Yep will share! Just need to get home.