[Idea] Unity with C# to GPU power!

What if Unity made it easier to write parallel processing code for the CPU or GPU in C#?

Some of you will say no way or it can’t be done!

But it can!

CUDAfy .Net - “allows easy development of high performance GPGPU applications completely from the Microsoft .NET framework. It’s developed in C#.”

Example of use here - w8isms: GPU Performance Tests

Cool Tech, write C# code that is cross compiled to the GPU!

OK Unity would need to make it so you could just tag a region of code for parallelism, for me at least but imagine what people with real programming skills could do once they can Untap the power of their GPU’s.

That’s unless Microsoft is developing a .Net to GPU technology???

What would you code in Unity if you could unleash the power of your GPU?

5 Likes

Isn’t your GPU already busy rendering stuff?

3 Likes

Good point but a modern gaming device has a CPU and GPU and why not take advantage of both.
A simple scenario would be you give your CPU a lot of processing to do and your GPU is left waiting on the CPU.

But if the task can be done faster in parallel on you GPU. Then you’re CPU could load the task onto your GPU, while it works out what is needed for the next frame. Your GPU finishes and the CPU passes it the rendering task and picks up the results.

A bit simplistic, but isn’t this the direction the industry is going.

Don’t take my word for it check the Dice Frostbite game engine industry lectures.

What could be better for game engines and game developers than having two processors, one great for serial tasks and small multi-threading jobs and the other good at small massively parallel tasks. Or on Mobile or APU in a single chip.

And being able to access both with a single language.

This would be definitely cool. Would have many uses for leveraging the GPU power.

1 Like

Compute shaders are as easy as it gets, though dx11 only.

LOL come on why do you have to learn shader programming just to write a subroutine that could be quicker on the GPU.

What makes you think that’s not already happening?

2 Likes

Mac support!

Expanding on this, talking mostly about PC, most gamers and/or developers max out GPU time before they max out CPU time. Where possible, visual stuff is typically cranked up to the point where the system is only just managing an appropriate frame rate, and this usually puts more pressure on the GPU than the CPU. So, in the use case of a game or highly visual application, moving more stuff to the GPU when it’s already under high pressure in order to reduce CPU load which is usually under less pressure doesn’t make sense.

Exceptions to this are stuff that work really well on the GPU that would bog down a CPU, or less visual apps where the GPU isn’t under particularly high load.

2 Likes

I’m not aware of any plan for this feature within Unity or C#, do Unity have this on their development road map?

Then why did game developers want/need AMD’s Mantle or IOS’s Metal to overcome the performance bottleneck between CPU and GPU?

And how come Nvidia are always showing off these amazing tech demos where they simulate fluids and galaxies with their latest GPU?

Because they’re tech demos. They’re meant to look fancy. Would they have got you this excited if they showed you a number crunching benchmark which just printed a few lines of text on the screen? I suspect not.

How many people buy GPUs based on their ability to crunch data for reports or compress video quickly? Some, but not nearly as many as gamers buy to push more pixels on bigger screens for newer games. :wink:

I could be wrong, but I think that’s more about bus bandwidth (“draw calls”) than computational speed. No one number tells the whole story of a system’s performance.

does this work in unity? I mean, I add the required assembly to unity c# project and it works, just like in .net?
I am excited.

It’s not that difficult. It’s pretty much required for some kinect 2 projects where you’re working with the raw buffers off the sensors in real time.

Very basic example: Drawing and randomly updating position of 350,000 structs with 2 vector3 variables each at 60+ fps.

using UnityEngine;
using System.Collections;

public class BufferExample : MonoBehaviour
{
    public Material material;
    ComputeBuffer buffer;
    const int count = 350000;  //number of vertices to generate
    const float size = 5.0f;
    Vert[] points;

    struct Vert
    {
        public Vector3 position;  //self explanatory
        public Vector3 color;
    }

    void Start ()
    {
        buffer = new ComputeBuffer (count, sizeof(float) * 6, ComputeBufferType.Default);
        points = new Vert[count];
        Random.seed = 0;
        for (int i = 0; i < count; i++)  //make 350,000 verts with random color and position
        {
            points[i] = new Vert();
            points[i].position = new Vector3();
            points[i].position.x = Random.Range (-size, size);
            points[i].position.y = Random.Range (-size, size);
            points[i].position.z = Random.Range (-size, size);

            points[i].color = new Vector3();
            points[i].color.x = Random.value > 0.5f ? 0.0f : 1.0f;
            points[i].color.y = Random.value > 0.5f ? 0.0f : 1.0f;
            points[i].color.z = Random.value > 0.5f ? 0.0f : 1.0f;
        }
        buffer.SetData (points); //set the buffer data
    }

    void FixedUpdate(){
        for (int i = 0; i < count; i++)
        { 
            points[i].position.x = Random.Range (-size, size);  //slow to do random in update, just example
            points[i].position.y = Random.Range (-size, size); 
        }
        buffer.SetData (points);
    }

    void OnPostRender (){
        material.SetPass (0);
        material.SetBuffer ("buffer", buffer);
        Graphics.DrawProcedural (MeshTopology.Points, count, 1); 
    }

    void OnDestroy ()
    {
        buffer.Release ();
    }
}

The “scary” compute shader:

#pragma kernel CSMain
StructuredBuffer<float> buffer1;
RWStructuredBuffer<float> buffer2;

[numthreads(8,1,1)]
void CSMain (uint id : SV_DispatchThreadID)
{
    uint count, stride;
    buffer2.GetDimensions(count, stride); 
    buffer2[id] = buffer1.Load(id);
}

And a basic shader to show the position/color:

Shader "Custom/BufferExample/BufferShader"
{
    SubShader
    {
        Pass
        {
            ZTest Always Cull Off ZWrite Off
            Fog { Mode off }

            CGPROGRAM
            #include "UnityCG.cginc"
            #pragma target 5.0
            #pragma vertex vert
            #pragma fragment frag

            struct Vert
            {
                float3 position;
                float3 color;
            };

            uniform StructuredBuffer<Vert> buffer;

            struct v2f
            {
                float4  pos : SV_POSITION;
                float3 col : COLOR;
            };

            v2f vert(uint id : SV_VertexID)
            {
                Vert vert = buffer[id];
                v2f OUT;
                OUT.pos = mul(UNITY_MATRIX_MVP, float4(vert.position, 1));
                OUT.col = vert.color;
                return OUT;
            }
            float4 frag(v2f IN) : COLOR
            {
                return float4(IN.col,1);
            }
            ENDCG
        }
    }
}
3 Likes

I don’t think so as CUDAfy, also needs the Visual Studio C++ as it converts your C# code to CUDA or OpenCL code. For Unity this would probably be more like how Unity builds for IOS where Unity generates an IOS/Mac project that is then compiled for Mac or IOS.

There are also additional dependencies, e.g. CUDA SDK that Unity Games would need to include.

@Imbarns Nice but what if with Unity developed a CUDAfy like technology where you could instead write something like more like this.

using UnityEngine;
using System.Collections;
using UnityEngine.GPU; // ideal GPU enabler

public class BufferExample : MonoBehaviour
{
    public Material material;   
    const int count = 350000;  //number of vertices to generate
    const float size = 5.0f;
    Vert[] points;

    [GPU DATA]
    Vert[] gpuPoints;
   
    struct Vert
    {
        public Vector3 position;  //self explanatory
        public Vector3 color;
    }

    void Start ()
    {
        points = new Vert[count];
        Random.seed = 0;
        for (int i = 0; i < count; i++)  //make 350,000 verts with random color and position
        {
            points[i] = new Vert();
            points[i].position = new Vector3();
            points[i].position.x = Random.Range (-size, size);
            points[i].position.y = Random.Range (-size, size);
            points[i].position.z = Random.Range (-size, size);
            points[i].color = new Vector3();
            points[i].color.x = Random.value > 0.5f ? 0.0f : 1.0f;
            points[i].color.y = Random.value > 0.5f ? 0.0f : 1.0f;
            points[i].color.z = Random.value > 0.5f ? 0.0f : 1.0f;
        }
        [GPU]
        gpuPoints = points; // triggers a loading of the data onto the GPU
        [END GPU]
    }

    void FixedUpdate(){
        for (int i = 0; i < count; i++)
        {
            points[i].position.x = Random.Range (-size, size);  //slow to do random in update, just example
            points[i].position.y = Random.Range (-size, size);
        }
        [GPU] // triggers the generation of GPU code that is triggered from fixed Update
        for (int i = 0; i < count; i++)
        {
            gpuPoints[i].position *= points[i].position;
        }
        [END GPU]
    }

    void OnPostRender (){
        Graphics.DrawProcedural (MeshTopology.Points, count, 1);
    }
}

This is only pseudo code to give you an idea of what could be developed by Unity.

Note: You would probably still need your shader to draw the data.

3 Likes

If I used CUDAfy to build a dll or assembly and it generates the OpenCL and CUDA GPU code could I then use the dll and code with Unity?

And if Unity is moving over to IL2CP could they add a GPU feature set to ease GPU programming?

The big problem I see is that it only targets CUDA capable cards. If it were to happen, and I’m not sure if it ever would as the benefit to Unity likely wouldn’t outweigh the cost of development, they’d need to build it as an agnostic API that supported CUDA / Mantle / Metal and would still fall back to CPU only if the capabilities didn’t exist on the target hardware.

Ahh No actually that’s just it’s name it supports CUDA and OpenCL as well (see link for details) https://cudafy.codeplex.com/

Note that OpenCL is supported on a range of ATI and Mobile Platforms (see link for OpenCL compatible hardware) Conformant Products - The Khronos Group Inc

Hey Unity’s WebGL builds could also have a WebCL option! :wink:

UT Guys, Do it. Pleaaaase. :frowning:

1 Like

What if game is 2d? It would require less rendering power. As a result GPU is nearly useless in such situations.

1 Like