CUDA + Unity-OpenGL interop on vertex/index buffers

Hi,

I’m trying to work on improving perf a method we are using for generating mesh data in an external plugin. We are generating triangle data on the fly using CUDA and updating the vertex and index buffers on a Mesh. Currently, the approach here is a bit circuitous, as it fills in mesh data on VRAM, does a cudaMemCopy to main memory in temporary buffers provided by our Unity app, and then within Unity it takes that main memory version and updates the Vertex and Index buffers for the mesh, thereby uploading it back to VRAM. This round trip is costing us up to 5 ms per frame, and I’d like to eliminate that by just copying directly to the corresponding vertex buffer.

Due to additional dependencies, we are limited to OpenGLCore for the time being – though isolated testing on D3D11 isn’t off the table – so I tried getting the NativeVertexBufferPtr and NativeIndexBufferPtr from the mesh (both of which were allocated and expanded as needed to fit the potentially high vertex count). So as it’s GLCore, that means I get GLUints from this, which I presume are names for the VBO and IBO of the mesh. This is where I’m hitting a snag. Back in my plugin, when I try to register the resource (in the hopes of getting a mapped pointer), cudaGraphicsGLRegisterBuffer(...) reports that my results from GetNativeVertexBufferPtr and GetNativeIndexBufferPtr are not valid names to any buffer resource.

Has anybody tried to do anything similar or is there something I’m missing that I’d need to do prior to retrieving these resources?

Did you check these examples on native plugin rendering?

Already tried those. Not really getting at the core of the problem in the end. At the very least, using some of the bits they used in that sample did verify for me that all the handles point to something that is zero-sized irrespective of what it is. In any sense, considering that our original data is anyway on the GPU generated from CUDA kernels, the approach used in the samples doesn’t really help with avoiding that round trip between VRAM and main memory.