Reading these posts ( Convolutional Neural Networks without using an image render Can I change the number of channels for a visual observation? ), I wonder if it’s possible to encode 3 spatial dimensions + labels in a similar manner. My agent scans its surroundings, creating a voxel map with each voxel having a label/value, representing things like “walkable”, “obstacle” etc. Could convolution help with handling the thousands of data points that are being generated this way? Thanks!
Hi @mbaske ,
There are two limiting factors to this right now:
- The mlagents trainer only supports 2D convolutions right now. 1D and 3D should be fairly easy to add, but we haven’t had a chance to yet (and don’t have a good example to show them off).
- Barracuda inference currently has a hard limit of 4 dimensions (NHWC), but they’re actively working on relaxing this to allow more dimensions. No ETA on when this will be available though.
I’ll make sure the request for 3D (and 1D) convolutions are logged in our tracker.
Thank you @celion_unity !
@celion_unity can you expand on this some more? Do you mean that a 3 layer (RGB) image cannot be passed to the CNN within MLAgents? I would consider this a 3D observation (RGB layer x width x height).
@mbaske
That is something that definetly should work. In terms of implementing this in ml-agents at the moment, is the environment 'truely 3D, in the sense that there’s a lot of variation in all 3 dimensions, or an equivalent of minimap would work just as well. I’m just wondering if you could circumvent the current limitations with pushing the same data as a 2D image.
I want my agent to explore and map the rooms of a building. Rooms can be on different levels, so that would be 3D data. But I can think of ways to simplify this, perhaps by providing detailed observations for the current level (2D), but only rough data for the rest of the building, something like “60% of the levels below were mapped”.
Did you use minimap type of visual observation successfully before? So far, I wasn’t able to get good results with that, for more details please see https://discussions.unity.com/t/782749/5
not personally, but I know a few examples where this worked, even in production. Good example of it is SEED, where they experimented with that approach in Battlefield 1 Teaching AI-agents to Play Battlefield 1
The “minimap” approach mimics partial observability of the environment and at the same time simplifies A LOT what the agent is getting in terms of input, if you compare it to purely visual input.
Sorry, I should have been more precise. RGB images use the conv2d operator in tensorflow (example usage). At inference time, Barracuda tensors currently only support indexing by up to 4 dimensions; for RGB images, these dimensions are the index of the agent, image height, image width, and color channel (NHWC) respectively.
@mbaske 's request was to support 3 spatial dimensions + channels, which would require the conv3d operator, and additional support in Barracuda.
conv1d operations are also a reasonable request; that’s what OpenAI uses in their Hide-and-seek example for raycast observations.
Just for posterity, this is logged as MLA-1220 in our tracker.
@mbaske Sir there is no scene file provided for explorer drone. Can u share it please?
Hello sir,
I suppose you are not working for Unity any more due to Mlagents development being dead but is there a way to use 3D CNNs?
With Kind regards