[Solved] How to calculate rotation matrices

I don’t know why the rotation I calculated is incorrect.
The two GameObjects in the image below have the same rotation.
The GameObject on the left uses Unity’s Transfrom, and the GameObject on the right uses my own calculated rotation matrix.
4541572--421213--upload_2019-5-15_11-52-18.png

Reproduce it

  • git clone GitHub - huangwwmm/Unity-Meteorites
  • open Scene “Assets_Test\Test.unity”
  • Please re-enable SphereDisperseMesh component of GameObject Meteorites. A if it is not displayed correctly.

Related code
Matrix calculation in SphereDisperseMesh.compute

float4x4 mat_local_s = {
    s.x, 0, 0, 0,
    0, s.y, 0, 0,
    0, 0, s.z, 0,
    0, 0, 0, 1
};
// rotation z
float4x4 mat_local_rz = {
    cos(r.z), sin(r.z), 0, 0,
    - sin(r.z), cos(r.z), 0, 0,
    0, 0, 1, 0,
    0, 0, 0, 1
};
// rotation x
float4x4 mat_local_rx = {
    1, 0, 0, 0,
    0, cos(r.x), -sin(r.x), 0,
    0, sin(r.x), cos(r.x), 0,
    0, 0, 0, 1
};
// rotation y
float4x4 mat_local_ry = {
    cos(r.y), 0, sin(r.y), 0,
    0, 1, 0, 0,
    - sin(r.y), 0, cos(r.y), 0,
    0, 0, 0, 1
};
// position
float4x4 mat_local_t = {
    1, 0, 0, t.x,
    0, 1, 0, t.y,
    0, 0, 1, t.z,
    0, 0, 0, 1,
};
float4x4 mat_local_tr = mul(mat_local_t, mul(mat_local_ry, mul(mat_local_rx, mat_local_rz)));
float4x4 mat_local_trs = mul(mat_local_t, mul(mat_local_ry, mul(mat_local_rx, mul(mat_local_rz, mat_local_s))));
_MeshStates[threadIdx].MatM = mul(_GlobalState[0].MatM, mat_local_tr);
_MeshStates[threadIdx].MatMVP = mul(_GlobalState[0].MatMVP, mat_local_trs);

The rotation of the right GameObject is in StartRendering() of SphereDisperseMesh.cs

m_MeshStates[iRole].LocalPosition = Vector3.zero;
m_MeshStates[iRole].LocalRotation = new Vector3(0, 0, 30);
m_MeshStates[iRole].LocalScale = Vector3.one;

Any reason you do own calculation of matrices? Did you checked, if xyz axis is not change in terms of ordering of rotation?

If anything, you would be better use quaternions.

I need to renderer tens of thousands of the same model in the scene, I used gpu instancing for performance. so I need to calculate the MatMVP to transform vertex in vert shader.

v2f vert(appdata_custom v)
{
    v2f o;
    o.instanceID = v.instanceID;
    o.pos = mul(_MeshStates[v.instanceID].MatMVP, v.vertex);
    o.uv = v.texcoord.xy;
    o.normal = v.normal;
    return o;
}

My Scene structure is such as follows:

  • Meteorites (GameObject)

  • Model1 (not GameObject)

  • vertex1

  • vertex2

  • vertex3

I need transform vertex to projection space.

If you want translate position and rotations of many thousands objects, you are better look into ECS/ DOTS. That what is designed for. For example check Megacity Demo. And other ECS based demos.

Compute shader is much faster than ECS. But how should matrices be calculated?

2 Likes

Are you positive?
Because you need transit data from GPU to CPU back and froward, and that is slow.
What makes you think ECS is slower than GPU?

Matrices are nothing more, than values multiplications.
But definitely using sin / cos, could be optimized, as these are otherwise quite expensive to compute.

You can check in ecs, how quaternion and position is translated, by matrices multiplications.

What you basically try to do, is to make GPU busy, leaving no much space, for other GPU rendering.
Then you got other graphical effects on top.
And CPU will be much idle, with its multiple cores.

I need to transfer tens of millions of vertices per frame. CPU has only a dozen cores, but the GPU has thousands of Cores. ECS is unlikely to be faster than Compute Shader.
And rendering need GPU, so I need transit data from CPU to GPU every frame if using ECS. but if I use ECS, I only need transit data from CPU to GPU at initialization time.

1 Like

Instead of speculating, did you actually have seen any of ECS demos? Because by all means, they show exactly what you need, weather is 1k, 10k, or 100k moving objects along with complex logic.

More than 100k, About 50000k. Do you really understand what I want to do?

2 Likes

From initial post, I would though you want something up to 100k.
However, you still didn’t answer my previous question.

But you want really some particles behaviours. Which means, you need some scripting motion logic, and particles interactions. Not mesh colliders, if that is requirement at all.

What you exactly try to achieve?
Some fluids simulation, or asteroids, or other particles?

What algorithm you are planning to use, to control particles motion?

I want a group of meteorites in the universe, About 50 million vertices. They can all calculate in the GPU because they move at random and don’t need any interaction.

4553137--422686--upload_2019-5-18_1-7-58.png
There are 50,000 models. About 1000 vertices per model. I now calculate it using the GPU, which takes less than 1 millisecond per frame.

Not sure if will be any of use, but you can try that

But for such high count, you want simplify math as much as you can, by linearizing.
Unless you are doing some simulation, making millions of moving particles for a game is none practical.
That why there are tricks and illusions applied, in game dev. to reduce cost of computation.

Here is a thing.
You don’t need do that at all.
You just move particles.
Use LOD, when getting close to asteroid.
then bring simplified model, and if very close, then full model.
This way, you never need move million of vertices.

LOD can solve this problem. But my goal is to learn how to use the compute shader

Going back to the original topic:

I think your rotation matrices have their negated sin component wrong, at least for the X and Y. Try swapping which sin() has the - on those two.

1 Like

Seams like @bgolus has valid point.
Check your individual matrix rotation - signs.


mat_local_rx and mat_local_ry

Btw, rotation is 3x3x matrix not 4x4.
Final result you can multiply with position, or scale if needed.
So you can save on few multiplications.

Also, make sure, you got right order of rotation multiplication.

ECS is super fast … compared to more traditional CPU programming paradigms. Compared to even a mid tier consumer GPU it’s probably a few order of magnitude slower for doing this kind of thing. Modern consumer CPUs are in the 15~100 GFLOP range for peak performance utilizing all cores (which is going to be rare in real world scenarios). Even the crazy top end Xeon and Threadripper CPUs top out around 160 GFLOP. Consumer GPUs are in to 2000 ~ 14,000 GFLOP range, and can absolutely saturate almost 100% of all its “cores” with a well written compute shader.

4 Likes

I tried swap sin, but it is still wrong

I fix it. The reason is the angle I use in the shader instead of the radians

m_MeshStates[iRole].LocalRotation = new Vector3(0, 0, 30) *  Mathf.Deg2Rad;
5 Likes