Exploration of custom diffuse RTGI "approximation" for open gl es 2.0 and weak machine

Hello,

I have been conceptualizing a model of slow RTGI, originally I was just looking to have mock time of day shadows and volumetric hair rendering hacks, but exploring those two subjects made me realize I could use ideas to get some GI approximation. I want to explore many implementation to see how they translate visually, and how useful they can be.

There is one key concept: MAGIC :stuck_out_tongue: (Mapping Approximation of Global Illumination Compute). Basically we store surfel like elements on a texture, that compute light, then distribute their data to each other through "visibility textures, that store address of other visible surfels, with query relative to a point. It form a cyclic graph that propagate lighting recursively. It turn the whole problem to a sampling issue, simplest implementation only need two texture fetch per ray (address then data), thatā€™s 4 rays on ogles 2.

Both texture are approximation of the geometry, and there is many way to translate these ideas, it can be as accurate as we want to be, given multiple practical trade off, like precomputing quality visibility off line. The main limitation is that all geometry to update must ā€œmore or lessā€ fit a single texture (there is way around that). Itā€™s also environment to environment lighting mostly (there is way around that too, using UV probe that allow dynamic to sample the lightmap), since itā€™s designed for weak machine, expect rough result. It has, however, the benefit that we can spread compute over many frames, async from frame rates. It also render to a texture, so itā€™s ā€œbakedā€ for once it nothing change.

The first implementation I want to try is MAGICAL :roll_eyes: (MAGIC Applied by Lightprobe), where the visibility is stored in box projected addresses probes. Which is a solution that didnā€™t need offline baking and was compatible with procedural generation. There is multiple way to implement it, but I wanted to find a way to place the lightprobe automatically, researching that I found a way to do a voxelization pass on open gl es 2.0 storing occupancy of cells in bits. Ogles 2.0 donā€™t have bit testing, I found a way to do that without an expensive LUT.

This implementation of MAGICAL have expected shortcoming, most notably because it ruthlessly approximate the environment through Box projection, as seen in the schema below it can be wildly inaccurate.


The sample here miss an obvious occlusion, worse, due to the visibility of the cubemap, the ray actually BENT and go to a geometry instead of the sky ā€¦ There is probably way to slighty mitigate that ā€¦ with offline baking of bent cone (which disqualify procedural generation), or designing environment that match the limitation (which on weak machine would be better than nothing). We could also spend cycle raymarching the cubemap stored depth, to check false positive, but then itā€™s limited to the current cubemap, if the ray escape the visibility we should probably hash again to another visibility, the technique is no longer cheap by those standard ā€¦ There is probably other way to reduce the approximation.

We just hope itā€™s good enough for the target machine, and artistic rendering that donā€™t want their visual to devolve into flat ambient (for pcg) with harsh shadow, or make time of day update more lively. As it is async and render to a texture, it can also bake ambience and lighting at initialization time (can be a preprocess before the level) and allow to not compute lighting in shader, only using sampling. That probably make it fast enough if good enough, if devoid of any fancy.

The recipe is as is.

  • I must ensure I can correctly generate data from the geometry to a lightmap unwrapped texture, which will store the surfels (albedo color, world normal, world position, shadow masking).
  • that surfel data will be used to create a lightmap GBUFFER that will compute direct lighting.
  • Iā€™ll try to manually place probe first. That is designing a test environment around a grid of probe position. Voxelization will need to be tested at a later date, as a way to automate placement. Ideally each pixel of the lightmap can hash the cell it will query data from, or store it in an index channel (up to 256). The difficulty is how to manage index of pixel inside a non empty cell, as the voxel resolution is magnitude bigger than the pixel.
  • Each probe must compute the UV lightmap projection of the scene, with UV color being the address of each point, creating a visibility of each surfel. I need to find a way to encode miss ray (not part of the scene, probably reserving 0.0). The lightmap normal will be used to compute the sampling rays over its hemisphere, which will be spread over time with a cycle counter.
  • The cubemap will also store the depth of each points in the geometry to compute an attenuation.
  • I need to see if I can project a temp 6faces cubemap texture to an atlas of 2D octahedron cubemap. To make it easier for the target machine.
  • I need to figure out if I can correctly accumulate lighting (direct and sampled) in a target GI lightmap that will be sample by objects.
  • Test adding a far field cubemap (position 0.0) for sample that goes beyond the region of the lightmap (miss ray). Itā€™s not a skybox per se, it store the surrounded scene lighting (maybe other scene tiles), if those scene update GI, and then are capture by the far field, they effectively transmit their GI lighting to the scene, thatā€™s a minor solution to the locality of the lightmap.
  • In theory, we can do the full BRDF on hit samples to get higher quality light, Iā€™ll avoid it in the beginning in this implementation for simplicity. Also it increase the counter of sample needed (query the albedo and normal, so 2 more fetch) which limit open gl es 2.

This first implementation of MAGICAL will be limited in precision:

  • to a lightmap texture of 256x256, because I will use 8 bits UV for the probe. GI lightmap are traditionally very low frequency, enlighten precompute GI recommend 1m per pixel.
    There is actually 5 RGB map (albedo, normal, world position, direct light, GI accumulation) roughly equivalent to one 1024 map + one 256 then. There is to see if the need to merge direct lighting and accumulation is needed. Having a separate direct lighting is useful to not loss data in the update, and not having to recompute light to inject, it can be kind of a cache I think? EDIT: The count is probably 6 RGB map, as we probably need to do double buffering on the color accumulation.
  • to a cubemap atlas of 2048x2048 with 256 cubemaps (each cubemap being 128x128 and arranged in a 16x16 grid) the projecting cubemap will sample the scene at 64x64 resolution per faces.
  • The first example will probably not use all 256 cubemap slots, thatā€™s probably overkill.

Assuming we would need all slots, the upper initialization update is 256 x 6x64x64 = 6 291 456ā€¬ pixels, thatā€™s EXACTLY one 2048Ā² texture + two 1024Ā² one, but we project back to the atlas that is just 2048, the problem being that it need up to 1536 render to fill all slots! and we are oversampling by half! size 32 undersample by half, and Iā€™m not sure how projecting to octahedron conserve accuracy, since the data must be exact as addresses. Fortunately we can do render one at a time spread over time, and once itā€™s done, we donā€™t need to update unless the scene change, and if we move, we would need to only update edges.

Then we need to assess visual artifact if they are severe enough, see if we can evolve the solution to take care of them, or if we need to design around them, which is a price to pay for doing GI on weak machine.

My working station is a GT 705, for reference itā€™s a bit weaker than a wiiU, itā€™s the weaker of the current line of Gforces. My laptop is a weaker radeon R2, and my mobile phone is a logicom tab 741 with a mali 400 MP1 and a screen of 800x480 512 ram (expecting 32 for actual comfortable use, Iā€™m a mad man I want to do an open world on that!).

I just need to wait for reinstalling unity (had some issues) and github ā€¦ :smile:

8 Likes

https://github.com/Neoshaman/Mapping-Approximation-of-Global-Illumination-Compute/tree/master/MAGIC
I have started to make the prototype a while ago, but I have some question about custom render texture, there isnā€™t as much documentation and example as I expected from the community:

Right now my brain isnā€™t conceptualizing a solution :face_with_spiral_eyes: it feels it should be simple however, I should spread this computation on many frame, which is kind of a problem because thatā€™s defacto 256 frames! thatā€™s roughly 5s, Iā€™m not sure waiting 5s everytime we want to rebuild the atlas is great, not a problem on startup, but come on! We want the full meal course ā€¦ eventually :stuck_out_tongue:

I also figure out I should know more about bandwidth issues, I realized I donā€™t have a complete grasp of the implication.

Anyway my brain is currently foggy, I canā€™t visualize the code I need t do even that right now, I can only work on stuff I have an habit of doing already. Well I hope the prototype can help me land a job somewhere.

1 Like

Itā€™s working as intended now. It takes ~800ms to generate the 1500 view on GT 705 this card is roughly the power of a switch but weaker!

Octahedron mapping to atlas

Cubemap UV view to atlas

Next is generating the Lightmap G buffer (LMGB) which mean I have to do a bit of level creation to have something distinct to also test if the unwrapping shader is correctly set.

3 Likes

Man, youā€™ve been wanting to do this since forever. How does it feel to finally get cracking on this concept?

3 Likes

It feels fun, I just realized that what was holding me back was just being underfed anyway (missing 2 third of whatā€™s needed), I would have done it earlier, now Iā€™m also working on optimizing a basic diet to meet the daily target, instead of just buying whatever I can. Iā€™m not there yet, compiling data.

Anyway. Right now Iā€™m planning the Lay out and overthinking it. To do the LMGB I need a level design, then unwrap it. Back of the envelope calculation:

  • Each cubemap tile is 128Ā²px (16 384 px - ray)
  • The basic implementation is limited by UV being 8bits so I can only index size 256, that mean the LMGB is limited to that size.
  • Currently Iā€™m just plainly laying the 256 probes on a 16x16 grid naively for test, so level must take that into account, that is wall must be around the probe.
  • Since LMGB is limited to 256Ā² of size, we have 65 536 surfel point to unwrap to. We are surface limited, the more complex the level, the more surface to share the surfel with, that is less surfel per surface. We need to considerate level design carefully.
  • If the level is just a plane below that encompass the grid, we got 16Ā² pixel per probe (256 point).
  • Iā€™m trying to find a simple way to get shadowmasking, ideally it would be real time, but I donā€™t want to code and debug a shadowmap. Iā€™m thinking of baking shadow, then use analytical projection of primitive, but that might not save time actually.
  • Iā€™m not implementing far field yet.

BRB I need to look at pro builder tutorial.

1 Like

About shadow masking, despite not encoding depth yet in cubemap, I realized I could just check a light visibility using the lightprobe data, comparing the position of the light against the depth. Using multiple sample I could make all light area light too, but for now just a binary check will do. It avoid me to do regular shadowmap pass.

My problem now is to find a strategy to encode depth, currently I think 16bits in the minimum, 8 bit depth is way too coarse. But the probe currently use RG for UV data and B for sky/far field masking, I have only 1 Channel left. Since B is basically binary, I would probably reserved a number (like 0 or max) to encode far field.

1 Like

Hey after some hardware misadventure (the dev machine, the laptop is dying) here is some quick update:

The black stuff in scene and game view is the unwrap shader test, the rainbow texture on the bottom right is the array of cubemap capturing the scene UV correctly.

So whatā€™s next is to:

  • properly bake the LMGB data.
  • Write the direct lighting shader
  • Write the GI accumulation
  • Write object lighting.
1 Like

This is pretty much GIA, but realtimeā€¦

Iā€™m a bit skeptical of the data, I donā€™t see srgb stuff on texture parameter, but at least it correctly unwrap the LMGB. Model should be made to texture size specification, but for now the pixel UV bleeding will do >.>

Letā€™s try to make the proper code for the direct light baking and then correct from there.

I was a bit stress than Unity seemingly didnā€™t have structure (in built in) to query light, after 3 days of reading the whole scripting reference line by line, I finally found the relevant part ā€¦ Unity - Scripting API: Object.FindObjectsOfType

        Debug.Log(FindObjectsOfType<Light>().Length);

seems to confirm the job, thankfully

ā€¦ at least I donā€™t have to manually redo a whole lighting structure :face_with_spiral_eyes:

Iā€™m doing meshless lighting by having all the mesh data baked into texture, I canā€™t just use the automatic culling unity does, and I probably need to reimplement manually all the lighting type too :frowning: Though Iā€™m open to alternatives ā€¦

SO I have been investigating what it mean to make direct lighting in the LMGB space and the overall implication for optimization.

The naive method would simply to get the light list and do a light pass per light on the LMGB. Naive optimization would be to separate each light type in their own list and just pass that as a single list, it seems we are guarantee 128 float4 at worst, thatā€™s roughly 64 directional/point lights per pass.

But since the goal is to do a coarse GI approximation I had to account for occlusion, basically shadow. Thatā€™s where thing get a bit costly, ie rasterizing shadowmap and storing map. we can separate those into two major map, directional and local light (spot and point), the idea being that local light have less range and need less precision, so we can store two local shadowmap in a single map by packing them as 16bits (2 channels) Which potentially could allow to also test in a single operation two shadow map.

But if go back to the basic idea of MAGIC, itā€™s that we reduce GI to solving the light of single a geometric point, through its hemisphere visibility integration, intervisibility between points making up a light graph.

So is there a way to do away with costly shadowmap? For each geometric point we consider every visible light contribution, so by definition we donā€™t get non visible light, so shadow would have been implicit to visibility, dependant on the quality of visibility structure.

I alluded to that by adding a depth buffer in the (approximate) visibility structure, itā€™s ā€œtheoreticallyā€ not necessary given the LMGB has already position of geometry, so we could compare the sample position to the light and infer occlusion. BUT the depth buffer was basically an optimization as we wouldnā€™t need to sample the position map. BUT given the visibility structure (cubemap) we use is a very coarse approximation (that actually bend light), shadow would probably be discontinuous, they would mostly qualify as coarse occlusion factor, it wouldnā€™t produce quality shadows.

The direct light map is an optimization, the goal is to separate light computation from the integration. Given, for a geometric point, integrating mean sampling surfel data from the LMGB, each sample mean sampling the PVS (indirection) and 3 or 4 (albedo, position, normal, bake shadow) LMGB data map, and running for each sample a light computation, that limit the number of sample possible in a single pass. Given each geometric point will already have computed its own light, we can decouple the computation on each surfel, then just sample the result (reduced to 2 samples), it also decouple the light update from the GI update. Itā€™s kind of like light prepass methodology.

The main issue of this optimization is that itā€™s encoding light per geometric point and donā€™t have any concept of occlusion, that mean that occluded point WILL get light, this will break the GI effects. We can use the pvs as occlusion, that is comparing the light position (or direction for directional) to the position (for local light, skybox/far field occlusion for directional) of the pvs sample in that light direction, obtaining the occlusion factor.

I also looked at Tiled and clustered lighting optimization, to see if it can be applicable to the lighting. These techniques take advantage of space locality, in screen and world space, to discriminate which light is applied on which pixel, based on spatial partition. There is two way to think about it for LMGB lighting, in world space using voxel group, or in lightmap space. A Primer On Efficient Rendering Algorithms & Clustered Shading.

BUT Lightmap space donā€™t offer the proper spatial coherence. Given we use surfel, lightmap lay out is really just a convenient one for the PVS used in this particular implementation (rasterizing indirection to cube map), each surfel are addressed absolutely with indirection texture, they are technically independent of each other, they can be packed using whatever other method (than cubemap rasterization). Also lightmap layout are coherent at the surface level, but not a the geometry level, surface can be spatially close in light map space but not in world space. Local light volume implied world space coherence.

Voxel space allow more coherence, basically you store a list of all non empty voxel, then for every non empty voxel you keep track of the light list. For every geometric point, you hash the position to get the key to the voxel and test only the light in that list. The problem is that lighting is made at the pixel level, so each pixel would have different light and list length, since we treat all pixel at the same time it doesnā€™t solve much. We could then use tile in lightmap space, but then the same problem of coherence is maintained, this time complexity instead of spatial but they are linked, as spatially close pixel would share similar light list.

Since geometric points are independent, we could pack them coherently to take advantage of tile, but then we lose the simplicity of the rasterization with cubemap as PVS, which is at the surface scope. Same problem if we try a method to prioritized pixel in view space. A different packing structure, other than cubemap rasterization of geometric lightmap address, would probably help (MAGIC instead of MAGICAL), buit finding one that is more fast and efficient (than cubemap rasterization) for run time instead of offline packing, especially on weak machine or pgc, has yet to be defined.

Given the method favor geometric surfaces, emissive surfaces are trivial to add, we can just rasterized the emissive channel directly into the direct light map, the intensity will be picked up when the GI update will compute. Using emissive might be the optimal way to add some local light, and they are area light by default. If we want the emissive to be dynamic, itā€™s probably a good idea to render them on another map to sample at the same time as the direct light, for the cost of another sample, or pack them as special surfaces. Given that lightmap and PVS resolution are fixed, itā€™s great to remember that the more surfaces there is, the more pixel they have to share, and that the pvs might not actually pick the full extent of a surface, potentially leading to some wastes.

For the prototype itā€™s unlikely I try anything complex. I anticipate that directional light would be mostly singular (sun) and could probably rolled into the skybox evaluation during GI update. Iā€™ll probably just do the occlusion factor and donā€™t bother with accurate shadow for local light, and local light might just be emissive light map (no need to render direct light?). Then we will evaluate the value of the result artistically.

This are getting a bit hash on my side, if anyone is interested in the result, some modest financial contribution can go a long way to accelerate the dev. Iā€™ll be also happy with material contribution too.

Why not setting up a Patreon or Ko-fi?

Itā€™s not ready anyway, I donā€™t have the proof of concept, it was mostly because many people try to leach code away in DM, so I put that to let them know they can help monetary if they want to see progress, in which they disapear lol

anyway
Iā€™m stuck debugging the atlas octahedron cubemap, which is like the true last challenge, and I donā€™t understand yet whatā€™s wrong:
5736805--602347--upload_2020-4-20_1-15-13.png

It seems I select the right part, but I donā€™t know how to check for bad offsets and sizing, I tried using a 16x16 plain color map for hash position, which seems correct, the box projection seems to work, but the sampling seems off :frowning:

Shader "MAGIC/BoxTEST"
{
    Properties
    {
        _MainTex ("Cubemap Atlas", 2D) = "white" {}
    }
    SubShader
    {
        Tags { "RenderType"="Opaque" }
        LOD 100
        Cull Off
        Pass
        {
            CGPROGRAM
            #pragma vertex vert
            #pragma fragment frag
            #include "UnityCG.cginc"
            #include "MAGIC.cginc"

            struct d
            {
                float4 vertex    : POSITION;
                float2 uv        : TEXCOORD1;
                fixed4 color    : COLOR;
                fixed3 normal   : NORMAL;
            };

            struct v2f
            {
                float4 vertex    : POSITION;
                float4 wpos     : TEXCOORD1;
                fixed4 color    : COLOR;
                fixed3 wnormals : NORMAL;
            };
            //------------------------------------
          
            v2f vert (d v)
            {
                v2f o;
            
                //vertex world position
                o.wpos = mul(unity_ObjectToWorld, v.vertex);

                //vertex screen position
                o.vertex = UnityObjectToClipPos(v.vertex);

                //normal to world normal
                o.wnormals =UnityObjectToWorldNormal(v.normal);

                //o.vertex = UnWrapToScreenSpace(float2 v.uv, float4 v.vertex);
                o.color = float4(v.uv, 0,1);// v.color;
                return o;
            }
          
            sampler2D _MainTex;
            //float4 _MainTex_ST;

            fixed4 frag (v2f i) : COLOR
            {
                //set size
                const float size = 4;
                const float2 cuberange = float2(16,16);
                //hash position to read the right cubemap in the atlas
                float4 hashpos = floor(i.wpos/size); //select the proper cell
                float2 hash_id = max(float2(0,0), min(hashpos.xz, cuberange)); // range limit to keep inside
                float3 hash_offset = hashpos*size;// float3(hash_id.xy,hashpos.y) * size; //start position of each cell

                float3 cubecenter =  hash_offset + (size/2) ;
                float3 mincube = float3(0,0,0) + hash_offset;
                float3 maxcube = float3(size,size,size) + hash_offset;//boxproject(wpos,wnormal, cubecenter, cubemin,cubemax)
                float3 projected = BoxProjectVector(i.wpos,i.wnormals, cubecenter, mincube, maxcube);

                //get the oct position on teh cubemap
                //-first get the id
                //-reduce the range to the size of the atlas unit (1/range)
                //-offset by id

                float2 octnormal = PackNormalToOct(projected);
                //transform oct to hashed cubemap
                float2 samplepos = (hash_id + octnormal/2)/cuberange;//(size*hash_id)/64 + octnormal;
                float4 cubesample = tex2D(_MainTex, samplepos);//sample the cubemap in the direction (world to oct)
                return cubesample;//return hashpos/16; //float4(hash_offset,1)/64;
            }
            ENDCG
        }
    }
}

damn it
DAMN IT
:rage:

after all these month of debugging
it was wrong all along :frowning:
since third post :face_with_spiral_eyes:
since the very beginning, i was curing the symptom :sweat_smile:
the repeated motif is supposed to be an unwrapping of a sphere normal on a square folded as an octahedron(modifiƩ)
so the corners should all be the same color :roll_eyes:
they obviously arenā€™t :hushed:
so when they joined they do shit :eyes:

float3 UnpackNormalFromOct(float2 f){
    float3 n = float3(f.x, f.y, 1.0 - abs(f.x) - abs(f.y));
    float t = max(-n.z, 0.0);
    n.xy += n.xy >= 0.0 ? -t.xx : t.xx;
    return normalize(n);
}

This isnā€™t it?
I mean I pass the uv coordinate to translate into normal space, to get a full sphere unwrap (uv 0 to 1 in oct space)
I didnā€™t made that, I picked up elsewhere

            fixed4 frag (v2f i) : COLOR
            {
                float3 normal = UnpackNormalFromOct(i.uv;);
                return float4 (normal,1);                                   
                //texCUBE(_Cube, normal);
            }

where is teh flaws?

First rules of asking for help, if you ask, you solve the second after ā€¦
I had an intuition that the UV range should probably rescale to negatives

correct looking wrapping

correct looking projection

so this is done, there is harsh transition when going from one cell to another, obviously, since I hash and sample per fragment, Iā€™ll blend it in polish phase though, Iā€™m going to finally have the light sampling to code ā€¦

applied to a non sphere mesh yield mostly black, which donā€™t make sense, upward normal should at least be blue in most case due to sampling the sky ā€¦ :face_with_spiral_eyes:

Looking at the box projected normals show some artefacts I have no idea where they come from, they are also spatially coherent with the mesh ā€¦ :hushed: WTF

Added an epsilon to world normals solved it ā€¦


The box projected occlusion factor, ie basically masking the skybox with the box projected mask

Iā€™ll try to make a demo with dynamic light, right now I encode light for test as

                float3 fakelight    = normalize(float3(0.5,0.5,0.5));

Itā€™s worth noting the image above donā€™t have the level properly aligned to hash grid vertically, which is why the wall have the bottom ā€œlightedā€, they cross two hash cells