GPU instancing is much slower than normal instantiating.

I’ve been trying to create grass for a while now and learned about GPU instancing. However when i tried to implement it and created ~100k grass objects the game had about 20 fps but with just using instantiate i got about 40fps. I think it might be that my GPU is a lot worse compared to the CPU (A ryzen 5 5600 with a gt 1030), but it shouldn’t create THAT big of a difference, right? Im very new to GPU instancing so it could also be something wrong with my code.

Creating grass using instantiate (~40fps):

 void Start()
RaycastHit hit;
    {
       spawnGrass();      
    }
    void spawnGrass()
    {
        for (int minValZ = 0 - chunkSize, maxValZ = 0, rowsGenerated = 0, a = 0; rowsGenerated < chunkAmount; rowsGenerated++, maxValZ += chunkSize, minValZ += chunkSize, a+= chunkAmount)
        {
            for (int minValX = 0 - chunkSize, maxValX = 0, chunksGenerated = 0, grassCounter = 0; chunksGenerated < chunkAmount; maxValX += chunkSize, minValX += chunkSize, chunksGenerated++, grassCounter += grassDensity)
            {
                for (int i = 0; i < Instances; i++)  // instances = 10
                {
                    Vector3 grassPos = new Vector3(UnityEngine.Random.Range(chunkStart.x + maxValX, chunkStart.x + minValX), 35, UnityEngine.Random.Range(chunkStart.y + maxValZ, chunkStart.y + minValZ));
                    Vector3 dir = new Vector3(0, -1, 0);
                   
                    if (Physics.Raycast(grassPos, dir, out hit))
                    {
                        if (hit.point.y > grassRange)
                        {
                           
                            GameObject grassClone = Instantiate(GrassLOD1, hit.point, Quaternion.Euler(0, UnityEngine.Random.Range(0, 360), UnityEngine.Random.Range(87, 93)));
                            grassClone.transform.localScale = scale;
                        }
                    }
                }
            }
        }
    }

GPU instanced grass (~20fps):

private List<List<Matrix4x4>> Batches = new List<List<Matrix4x4>>();
    RaycastHit hit;
    void Start()
    {
        spawnGrass();
    }
    private void Update()
    {
        RenderBatches();
    }
    void spawnGrass()
    {
        for (int minValZ = 0 - chunkSize, maxValZ = 0, rowsGenerated = 0, a = 0; rowsGenerated < chunkAmount; rowsGenerated++, maxValZ += chunkSize, minValZ += chunkSize, a+= chunkAmount)
        {
            for (int minValX = 0 - chunkSize, maxValX = 0, chunksGenerated = 0, grassCounter = 0; chunksGenerated < chunkAmount; maxValX += chunkSize, minValX += chunkSize, chunksGenerated++, grassCounter += grassDensity)
            {
                for (int i = 0; i < Instances; i++)
                {
                    Vector3 grassPos = new Vector3(UnityEngine.Random.Range(chunkStart.x + maxValX, chunkStart.x + minValX), 35, UnityEngine.Random.Range(chunkStart.y + maxValZ, chunkStart.y + minValZ));
                    Vector3 dir = new Vector3(0, -1, 0);
                   
                    if (Physics.Raycast(grassPos, dir, out hit))
                    {
                        if (hit.point.y > grassRange)
                        {
                            int addedMatrices = 0;
                            Batches.Add(new List<Matrix4x4>());
                            if (addedMatrices < 1000)
                            {
                                Batches[Batches.Count - 1].Add(Matrix4x4.TRS(hit.point, Quaternion.Euler(rotation), scale));
                                addedMatrices += 1;
                            }
                            else
                            {
                                Batches.Add(new List<Matrix4x4>());
                                addedMatrices = 0;
                            }
                        }
                    }
                }
            }
        }
    }
    private void RenderBatches()
    {
        foreach (var Batch in Batches)
        {
            for (int i = 0; i < mesh.subMeshCount; i++)
            {
                Graphics.DrawMeshInstanced(mesh, i, Materials[i], Batch);
            }
        }
    }

And don’t render grass like that. Here is a good video on how to do it properly:

The direct Instantiation of million of grass blades is an overkill and limited to only tiny areas.

Use a combo of pre batched models and instantiation to get the best performance.

Regardless of optimal grass rendering implementation. To give advice about your comparison of Instantiate vs GPU instancing;

First, to clarify some terminology. Even regular instantiated objects could be using GPU instancing. If they have a compatible shader, like the standard shaders, and you toggle on GPU instancing. Then the native Unity rendering pipeline will attempt to GPU instance those objects too. Sometimes they’re broken into different batches due to various things and also, with URP or HDRP. If you’re using SRP batcher, then it will take precedent over GPU instancing.

So, the Graphics API just bypasses a lot of the native Unity rendering, namely culling. That’s where I think you should first look to see the difference. In your DrawMeshInstanced example, it will always be trying to draw all those instances on the GPU, no matter where the camera is. For the instantiated objects, those will automatically get frustum culled if not in view. So, look in Frame Debugger to compare what’s actually being sent to the GPU in each of your scenarios.