Is it possible to do ray-casting from a parallel thread?

Hello.

I am integrating a camera control algorithm in a Unity project and, for performance reasons, I would like to execute it in a separate thread, or possibly more than one. This would allow me to:

  1. have the camera control algorithm running in parallel with the rendering, and
  2. exploit all the computational power from the machine, e.g. 4, 8 cores.

While, the first might be accomplished with coroutines, the second can not (but correct me if I got it wrong). In my knowledge the Unity API is not thread-safe, however since I only need to perform ray-casting there would be no writing on the Unity data structures.

Assuming that ray-casting is handled by the Physics part of the engine, is it possible to query the physic world from a separate thread without having Unity raising exceptions?

Thank you
Tommaso

Are you sure you need the performance of parallel processing? Have you actually profiled your code and determined that one core wasn’t enough?

And to my knowledge, coroutines are symmetric, they won’t execute in parallel.

Current i7 processors have 4 cores but they fake 8 with halved frequence through a technology called HyperThreading, of course you can disable HyperThreading from the BIOS to have 4 full-speed cores, but that’s not very comfortable in a general way. The only way to exploit all the computational power, regardless of the number of cores, is to develop with threads. Besides, there are some operations which can be parallelized very naturally, and this is my case, i.e. calculate and update a camera every now and then - possibly quickly - while the game is running on is own.

Yes, and we discovered that 90% of the time the algorithm was performing raycasting, however, as I said, the solution would be to do it from a parallel thread (this works when doing it in release, but in debug Unity raises exceptions).

Honestly, I don’t know. I guess unless someone knowledgeable responds soon, you might just have to whip up a quick test project to see if you can.

That doesn’t answer the question. Is the performance acceptable? If 90% of the time the algorithm takes is raycasting but your game still runs at 80 FPS (for example), why bother? If it’s actually impacting real world performance then you should worry about it. But before attempting parallelization I would first look at alternatives to whatever algorithm you’re using (perhaps you’ve done this).

Oh sorry.

Well yes, we tried to run this algorithm between a frame and another and the performance was not nearly real-time. Of course we could fill our algorithm with coroutines and yield statements, but this doesn’t look very nice and, besides, it is not parallel.

As I understand it, multi-threading in Unity is touch-and-go at best. To raycast from another thread you would need to synchronize it with the main game loop to prevent it trying to access PhysX before it’s been updated. Trying to do it this way is probably more effort than it’s worth, if it’s possible at all, and you run the risk of random inexplicable crashes if you port your project to another platform at a later stage.

Can you optimize or simplify your code so that you can run it in the main thread? Since it’s a camera control, you probably don’t need to update it every frame. Consider using co-routines, running your raycasts only every x frames, or spreading the raycasts out over several frames. You can also try to run the raycasts in a sequence that allows you to bailout early before running all of them.

Depending on the complexity of your objects, you could consider implementing your own simplified raycasting routine (simplified enough to get the job done, or perhaps reduce the number of “real” Physics.Raycast method invocations on the main thread). Your own custom raycast could be easily made to run on a background thread.

EDIT: how many physics colliders are in your scene? Are they simplified (cubes) or complex (mesh) colliders? How many raycasts are you sending out per frame?

The point is that Unity won’t like it, even if I manage all the synchronization.

This is a solution which I have considered, but on the performance side it won’t help (still locked to the main thread).

But I’ll need access to the Unity scene graph again, or you’re suggesting to clone the Unity scene in some sort of data structure of mine which can be used in another thread?

I can’t make assumptions on the number of objects or on the number of ray-casts per frame, since this information depends respectively on the size/type of scene and on the kind of camera control problem I am trying to solve.

you will have to clone.
Unitys functionality and data are a big no no for your thread or it will blow around your head.

but for raycasts there is no way to get around it anyway as physx runs in the engine thread, accessing it is not allowed unless you implement a second physics engine for your purpose through a plugin or alike

Yes.

How can increasing the speed of the code in your main loop not help performance?

You have to make assumptions, or at least set limits. There is no general camera control solution - it depends on the specific scene’s requirements. Multiple threads are not a magical solution to performance issues. You need to reduce the amount of processing that your camera control performs.

Would you be willing/able to post some of your code so that we can have a look at it, and perhaps make some more specific suggestions?

Not Quite.

When developing a single core they realized that certain elements don’t get used every cycle. For example one cycle may use floating point math while another would be integer based.*

The found out, by placing a relatively small amount of ‘management’ code they could get a huge improvement by getting the second thread to use these unused sections of the core.

Why is any of this relevant?

Simple - the threads do not ‘halve’ the frequency nor will disabling a thread give you a 200% speed boost. If this was so they would simply do that instead because threading tends to be hard for most programmers.

Anywho:

I believe the deal with Unity is that all the engine stuff is not thread safe - so the only way to thread ray casting would basically build your own ray-cast engine and then figure out how to get them to sync.

(*Not 100% sure on what instruction combinations - this for illustration points only)

The code cannot be simplified because it relies on a global optimization method. Unfortunately this is a university research project and I would share it if only I could, but I can’t. The point is that it was originally developed to compute static cameras, and we’re now extending it to the dynamic case. The use of thread could benefit to the performance and help having it work in real-time and it is way simpler than filling the algorithm with yield statements.

I was just oversimplifying, but you’re right.

if it can not be simplified due to something global then I would rethink the importance of the global method and if it is really optimizing enough.
Also line casts are already rather well optimized (physx uses extremely optimized k-trees and similar spatial seperation structures to reduce the amount of work it has to do)

Anyway with what you are heading to and the focus you have, using something where you have indepth control might be more favorable for your project than the productivity of unity as you are seeking control on a level you won’t be getting with unity …
Out of personal experience, if you want to get up fast, irrlicht + irredit or similar would be worth a jump in as you can get in rather fast while still having full access to all aspect and the full sources

and there is nothing complex about yield. all it does is split appart functions and reschedule the remainder to run at a later timeslice when it is identified as “should have been started in the past” by the corresponding code in a future update round.

That’s a pity, but fair enough. :slight_smile:

I assumed you were trying to write a game. I imagine your code is complex, but I don’t think multi-threading is going to be simpler in this case.

We’re maybe going to release it once it is fully working. :slight_smile:

not with unity at least.
a task that is inherently bound to unity internal data can not be parallized without losing more power than you gain.

you can only parallize tasks you can decouple datawise completely from unity and if you can do this you can as well just not use unity as you don’t gain much anymore if you have to send out meshes for collision data generation, have external physics etc

Pretty much. You could better manage how you determine what to check for in your raycast. Suppose you managed your own spatial tree (say, an RTree) and none of your objects would have physics colliders. Perhaps you can use that RTree to filter out thousands of items that are too far away to bother checking. You don’t need to “clone” your scene, but just register spatial aspects of it. There would be no problem running your custom data structure in a background thread, and hopefully reduce the amount of work needed on the main thread.

Similarly, you could use a built-in Unity method of determining which objects are shown (there’s one event in the API that fires when a GameObject is drawn/visible to the camera) and only when they are shown do you attach colliders on them. I guess what I’m getting at is if your bottleneck is making so many raycasts, try reducing the number of raycasts or reducing the complexity of the scene so the raycasts are faster.