Creating Tensors on the GPU?

In Tensorflow and Pytorch you can create tensors directly on the GPU.

Some people might find this useful. Using the ExecuteOperatorOnTensor sample I created a little helper function like so:

    TensorFloat CreateTensorOnGPU(TensorShape shape, float[] data)
    {
        using TensorFloat input = new TensorFloat(shape, data);
        return s_Ops.Copy(input) as TensorFloat;
    }

It seems to work, as in calling this lots of times, the GPU memory is going up while the RAM is staying fairly constant. (Just a little spike as you create the float array).

This might be useful if you have a lot of weights or data on your HD and you want to transfer it to the GPU but you don’t have much RAM.

Someone let me know if this function is totally wrong! :smiley:

Edit OK, just seen there is already a function to do this UploadToDevice(ITensorData) so I could use:

    TensorFloat CreateTensorOnGPU(TensorShape shape, float[] data)
    {
        TensorFloat input = new TensorFloat(shape, data);
        input.UploadToDevice(new ComputeTensorData(shape));
        return input;
    }

Although the first method seems to spike lower in the RAM interestingly.

BTW, I tried my method of taking a really big ONNX file and turning all the weights into inputs with a python script. Then putting this small ONNX with no weights into Unity. Then using the top function to push the weights into the inputs one by one bypassing the RAM. Worked very well with virtually zero RAM used either in the editor or during runtime. :+1:

1 Like

Hi @yoonitee, Do you mind sharing the python script that takes a large onnx model, turns all weights into inputs and produces the tiny onnx-file? It sounds quite practical for large models.

1 Like

Sure I put them here: GitHub - pauldog/FastOnnxLoader: Loads in onnx files with less RAM I was using it first with Onnx Runtime instead of Unity so you’d have to write your own C# script to load the weights but that’s quite simple as the weight files just store giant arrays. You also need to modify the names of the input and output files in the script. (Also you need to do a “pip install onnx” to get the onnx library)

The downside is that doing it this way Unity can’t optimise the model in this form. So there’s pros and cons with this method.

BTW. Another thing with this method is that you can use whatever compression/quantization algorithm you like to store the weights on disk. e.g. you could store the weights on disk as 16bit floats and decompress them to 32bit floats provided you had a fast way of doing that, to save disk space.

1 Like

You can do

Tensor tensor = new TensorFloat(shape, data);
ComputeTensorData.Pin(tensor, clearOnInit: false);

We do have a constructor that takes a ComputeTensorData directly but that is internal
https://docs.unity3d.com/Packages/com.unity.sentis@1.0/manual/access-tensor-data-directly.html

If you want to use the ops, you can do

tensor = op.NewTensor(shape, DataType.Float, AllocScope.LayerOuput);
or
tensor = allocator.Alloc(shape, DataType.Float, AllocScope.LayerOuput);

This creates a empty tensor, you then pin it to the GPU

ComputeTensorData.Pin(tensor, clearOnInit: false);
1 Like

Note, that with the ModelLoader/ModelWriter class you can store/load model weights as you wish.

LoadModelDesc(...)
CustomLoadModelWeights(....)

with LoadModelWeights needing to set all the model.constants.weights with the data you need

        for (var l = 0; l < model.constants.Count; ++l)
            model.constants[l].weights = weightArray;
2 Likes

@alexandreribard_unity Please correct me if I’m wrong, but ModelWriter is still not a public class, i.e. inaccessible.

Ah you are correct, we’ll fix it in a patch.
In the meantime you can click on the UI Serialize To StreamingAssets

1 Like

Thank you! For small-size models it works OK, but for larger models (>2 GB) it blocks the UI indefinitely and ultimately can crash the Editor (e.g. on MacOS) due to insufficient app memory.

We just published Sentis 1.1.1-exp.2 which makes the ModelWriter.Save method public. Let us know if this works for you.

1 Like

@gilescoope Thank you once again! Yes, ModelWriter.Save() is now public and works properly for small-size models. The issue with larger models still remains though, and can lead to Unity editor crash due to insufficient memory. I suppose either some temporary objects are not disposed properly, or GC needs to be called from time to time to clean up.

1 Like

Hi @roumenf,

It’s due to the Unity asset pipeline not cleaning loaded objects. We can’t do much at the moment to fix it, but you can work around it by saving it in the StreamingAssets folder either by code or clicking the “Serialize To StreamingAssets” in the Inspector window.

@liutaurasvysniauskas_unity Thank you for the info and for the suggestion! When I use the ‘Serialize to StreamingAssets’-button, the result is the same. The system reports insufficient app memory.

1 Like

Is this the same issue as this one: Memory Leak importing ONNX? - #3
Maybe someone on the “Unity Asset Pipeline Team” knows how to fix it. I don’t know if such a team exists. :grinning: But if they did they’d probably know how to fix it. Otherwise maybe there is a clever way to bypass the asset pipeline altogether. IDK.

One possibility if the pipeline can’t be patched might be a standalone app or command-line utility that converts the onnx into streaming asset. Or alternatively a button in Unity where you enter a filename for an onnx and it converts it to a streaming asset. This might fix one part of the RAM problem if not all. I don’t know what the best solution would be.

1 Like

So, we’ve debugged quite a bit into this.
The issue is that Unity keeps allocated memory during asset import cached.
We’ve reduced that amount to a minimum, but the model is essentially loaded twice.
The only solution around this is too use the Serialize to StreamingAssets or ModelWrite.Save.
Ofc this uses memory and so if your computer has a limited memory then you’ll run into insufficient app memory, or your computer will start paging.
The workaround is to make sure that your computer is using the less amount of RAM when you save your model.

Here are my numbers. Seem to backup @roumenf :

Model Asset size: 1300MB

First we just run the project doing nothing but the model as a public asset:


ModelLoader.Load takes no additional RAM.

At this point there is 5.4GB of free RAM. To save a 1.3GB model we’d probably be surprised if it took all of this:

ModelWriter.Save() Time taken: 10 minutes. (running in the editor).

If a 1GB model takes over 6GB of RAM to save it, presumably a 2GB model will need over 12GB RAM to save it. That’s more than most people have. For a 7GB model such as the smallest Llama, we would need 42GB RAM to save it and would take over 1 hour.

1 Like

which Sentis version are you running on?

Latest. I think ModelWriter.Save is only available in the latest right?

BTW. On another topic: LoadModelDesc() and LoadModelWeights() don’t seem to have versions that work with the streaming asset? as in where you put in a filename. Hopefully a version of LoadModelDesc() would be able to get the model desc from the streaming asset without loading the whole thing into RAM. That would be quite useful I think. And also if you could stream in individual weights from the streaming asset one at a time would be useful thing to have for developers to have full control. Perhaps something like: StreamModelWeight(path, “weight-name” , CompressionType, GPU). Something for the future maybe. :slightly_smiling_face:

1 Like

Experimental Faster Model Save Hack :rocket::last_quarter_moon_with_face: Up to 60x faster

@roumenf I came up with a hack that reduces the save time down from 10 minutes to 10 seconds! :sunglasses::+1: and uses about 80% less RAM. No idea if it will work for all models. Try it out if you like.
The gist of it is that it saves the big block of memory that is the weights into a separate file all at once in one go.
So now you have two files: model.sentis and model.weights. No python hacking needed.
It is a bit hacky (as in this is not how the API is designed to be used) so use at your own risk!
It assumes all the constants point to the same weight block of memory. Maybe this is not always true but it shouldn’t be hard to alter it.

Save the Model

       //load the model from the asset
        model = ModelLoader.Load(onnx);
       // the weights of all the constants seem to point to the same massive block of memory:
        NativeTensorArray weights = model.constants[0].weights;
       //let's delete the weight pointer from each constant
        for (int i = 0; i < model.constants.Count; i++)
        {
            model.constants[i].weights = null;
        }
      //save the weights to a separate file:
        using (BinaryWriter writer = new BinaryWriter(File.Open("model.weights", FileMode.Create)))
        {
            writer.Write(weights.AsReadOnlySpan<byte>(weights.Length * 4));
        }
      //save the model
       ModelWriter.Save("model.sentis", model);

Load The Model
(Unfortunately this is inefficient in terms of RAM since it has to load first into a buffer… maybe someone can improve on this? I don’t know if you can load from hard disk straight into a NativeTensorArray.)

       //read the weights into a buffer
        byte[] buffer = File.ReadAllBytes("model.weights");

        //load the model with blank weights
        Model model2 = ModelLoader.Load("model.sentis");
      
       //get a reference to the blank weights
        NativeTensorArray weights = model2.constants[0].weights;

       //copy the buffer into the weights block
        NativeTensorArray.BlockCopy(buffer, 0, weights, 0, buffer.Length);
        buffer = null;

It’s just a proof of concept. :bulb:

Conclusion
I think the reason why the API save model method takes long and uses a lot of RAM may be due to manipulating lots of NativeTensorArrays. I think perhaps they may be ordered in memory in such a way that it is fragmenting the memory. So big blocks of memory can not longer fit and so the memory just expands. That is just a WILD guess :laughing:as I have no idea how it’s implemented!

2 Likes

@yoonitee Thank you very much! :clap: I’m going to try your hack and provide feedback.

1 Like

That code won’t work :slight_smile:
Constants might have different weight buffers and you might bust the int32 length limit…
I’ll test out the split, maybe FileStream gets slow with a large array

2 Likes