You need to think about the rendering order more individually. One queue for all of the red boxes will not work when using stencils. Try using the frame debugger to get a better sense of how stuff is being rendered.
Stencils are a 2D screen space thing that do not care about depth. Any pixel your mask object rendered to will have the stencil written to, even if the object is not “visible” in the final render. In the frame debugger you’ll see when the mask object is rendered there’s nothing to obscure it, so all of those pixels have been marked.
If you want the mask to not write to the pixels behind the red boxes, you’ll need to render those first so they write to the depth buffer. Then when the mask renders, those pixels further away than the red boxes are rejected, and it does not write to the stencil.
Of course now the mask doesn’t “do” anything since the when the boxes rendered, the stencil hadn’t yet been written to, and thus were not masked.
TDLR: For the stencil to mask the boxes you need the mask object to render before the boxes. For the boxes to occlude the mask, you need to render the boxes before the mask. You can’t do both!
So … options.
Manual sorting:
Sort the boxes and the mask so the boxes in front of the mask render first, then the mask, then the boxes behind. If you disable batching on the shaders and set them all to the same queue, Unity may do this for you, but it won’t be guaranteed. You can use the material queue, or a script to set the renderer components’ sortingOrder setting. This does take a bit more work as you have to manage the sorting yourself either before hand or while the game is running via scripts, depending on how dynamic your camera / scene is. This is also the only real option if you plan on having multiple masks that affect different sets of objects.
Use only depth masking:
One option is to not use stencils at all, but instead render your mask as a depth only shader before the boxes. This doesn’t require any special sorting beyond setting the material queue, it just works! … except the sky won’t render behind where the mask rendered to, nor will any transparent effects, or anything else with a queue after the mask. This is because Unity’s skybox renders after all opaque objects anywhere the depth wasn’t written to, as are all transparent objects. You can fix the problem of the skybox by rendering it using a giant sphere in the level, or by using an extra camera that only renders the skybox and your main camera is set to only clear depth. You can also use the last queues of the opaque range (which ends at 2500), but the transparent queue problem isn’t really solvable.
Use a pre-depth:
The idea here is to get that impossible goal of drawing both the boxes and the mask “first” by splitting up the boxes into a depth only pass and the color pass. The idea would be to render the boxes first using a depth only rendering pass. This will fill in the depth buffer, but not actually show the boxes. Then the stencil mask will render, filling in the stencil buffer, but also get occluded by the depth of the cubes in front. Then draw the cubes again normally. This doesn’t require any special sorting, everything just works! … except just like with the previous option, the skybox and transparencies won’t render, just now where the hidden boxes were instead of where the entire mask is. Skybox can be solved like above, but there’s another option here which is to render the boxes again, but this time only render them with a depth only shader that clears the depth to the far plane, after the mask renders. This also requires you render out your cubes and mask before any other geometry in the scene, but it fixes the issue with the skybox, transparencies, or any later queues.
Use a second camera and render texture:
This is another way of skipping using stencils. The idea here is to render your mask not actually as a mask, but instead just with the scene as you want to see it without the offending objects present slapped onto its surface. Have one camera with the scene with the boxes, and a second with the scene without the boxes (hidden via layers). The mask object uses screen space UVs to display the render texture of the box free scene, and is rendered at the very end of the frame using a very late transparency queue, like 3999. This is the most expensive option, but it’s one of the easier to setup, and there’s a ton of tutorials on how to do this technique for portals and the like.