Marching cubes is a very powerful algorithm but it is really overkill for this sort of problem. There are also other more suitable algorithms, similar to marching cubes, that are more memory efficient or use less compute power. A little light reading will give you some ideas in that direction should you want to tackle a completely 3D version of this sort of game.
I cannot speak to a solution that is particular to Unity because I have not done this type of game in Unity before. However, I have done this type of game before and it is a thoroughly generic problem that can be solved in any almost any engine relatively trivially.
This solution presumes that you are doing a game like Lemmings or Every Drop Counts or one of those “fire, water and sand” puzzle games that are presented primarily in a side-view, two-dimensional play style.
Maintain two bitmaps of equal size, or in Unity’s case, a 2D array of values and a separate texture. We will refer to these two bitmaps as the collision bitmap (or a 2D array in Unity) and the display bitmap (or a texture in Unity).
The display bitmap is what the player can see; it is the nice looking pretty version that is displayed on screen.
The companion collision bitmap is something that the player never sees but is actually the bitmap they interact with when they erase part of the screen or tell a Lemming to dig.
When the player wants to dig or erase part of the screen to permit the water to pass through, simply detect where the player is dragging their finger on the collision bitmap. Ensure that the area of the collision map the player is dragging their finger through can actually be erased or doesn’t have some special property such as they need to drag their finger across it twice or extra hard or some other game mechanic. Different values in the collision bitmap indicate different effects such as diggable, impenetrable, extra hard, can only be destroyed with explosives, etc. Let’s assume that your entire collision bitmap is diggable at this point. As the player drags their finger through the collision bitmap, the game will erase parts of this collision bitmap where the finger touches. The game will also erase parts of the display bitmap too in the exact same corresponding position.
Now for a game like Lemmings, we’re pretty much done at this point. When the Lemming needs to walk around the game mechanics can just check the collision bitmap for where the Lemming can walk or stop them if the ground is too steep or burn them if the ground (collision bitmap) is too hot, and so on.
In a game such Every Drop Counts, there is an extra step, and this is where we need to calculate a perimeter on the collision bitmap so that it makes the physics for making the water splash around utterly trivial. First, we run an edge detection algorithm on the collision bitmap which will give us a nice clean image of where the edges of the collision bitmap lay. Then we convert that edge bitmap in to one or more grouped polygons to give us some nice clean polygons to test our collisions against.
We can also use those polygons to draw lines directly on to the display bitmap to give us the nice hard edges in a different colour that you can clearly see in the game play video.
Okay, so at the end we have a display bitmap, looks pretty, shown to the user, with some areas cut out now. And we have a collision bitmap, meaningless numbers to a human but valuable to the game, again with some areas cut out. And we have some polygons to use in collision tests that permit us to run some simple physics for the splashing water. Now we run our physics algorithms such as PhysX or Box2D (recommended), the little particles of water splosh around inside the polygons, bouncing off walls and being contained all very simply.
For detecting edges and converting to polygons and lines, check out AForge, a pure .NET library. For performing tests against polygons, polygon intersections, calculating perimeters and all sorts of nice tricks, check out the Clipper project, another pure .NET library.
Another thought is that Every Drop Counts could well be using very simple polygon cutting code to do the removal of the pixels. This is an even more trivial solution than working at the pixel level. They may not have a collision map, they may be using polygonal exclusion regions to prevent cutting in areas not permitted.