Convolutional Neural Networks without using an image render

Is it possible to use the CNN without supplying a rgb image? I have a grid I'm trying to represent with different states and I'm wondering if it's possible to use the CNN without creating an camera render.

Hi,
There's a related thread here https://forum.unity.com/threads/can-i-change-the-number-of-channels-for-a-visual-observation.832906 /

You should be able to follow the example there to create a 3-D observation (height x width x channels). If you have different states, you'll probably want to have one channel per state, so you could think of each "pixel" would be the one-hot encoding of that state.

@celion_unity I'm currently writing an array of observations where each spot in the array is represented by a one hot encoded value as vector obs i.e. [0,1,0,2,3,1] -> [0000, 0100, 0000, 0010, 0001, 0100] will this manner not be effective for training? You're saying I should switch to separate array for each value type so I can somehow pass it into the visual observation? [0,1,0,0,0,0][0,0,0,1,0,0][0,0,0,0,1,0]

Sorry, maybe I was assuming too much from your original question :)

What I meant was if you had a 2D grid representing a tic-tac-toe game with 3 states (empty, X, O), you'd want to represent a board like (I hope this displays correctly)

.|.|X
-----
.|O|.
-----
.|.|X

and you would encode it as something this

100|100|010
-----------
100|001|100
-----------
100|100|010

where "100" means a 1 in the first channel and 0 in the others.

In order for the observations to work well in a CNN, you want the convolution to happen on the "spatial" dimensions, and you want to do the encoding on the "channel" dimension.

I can't quite tell from your example, but if your data doesn't have some 2D spatial structure, trying to use a CNN probably won't work well for training.