To answer the whole thing…
First you need to consider multiple things like the maximum amount of textures you wish/can run at once within the materials. This is mainly based on the target VRAM you wish for the game to require.
As an list of examples :
VRAM usage :
Should your project be playable on a PC with a 512MB VRAM graphic card or will you limit it to 1Go or even 2Go (cutting out the market which can play your game). (If you aim at smartphones, you should not aim at anything higher than 512MB since even popular models still lack in memory)
How many textures will be required to load at once? Think of it as through the PC’s rendering process. Every textures, be it a normal map or a diffuse or even the alpha channel affect how much VRAM is required by the engine to render the scene. First, you got 8, 16 and 32 bits formats for the pictures. This is related to every pixels shown. For example, a 8 bits 1024x1024 RGB texture takes 8 388 608 bits (1Mo) of memory while a 32bit 1024x1024 takes 33 554 432 bits (4Mo) without alpha and 34 603 008 bits(5Mo)with alpha (used in transparency/specular.
The calculation to get how much memory a texture (or set of textures within a material) takes is as fallowed (for a 2K texture with alpha):
-
Know the amount of pixels : 2048*2048 = 4 194 304 pixels
-
Multiply the result by the color bits (1,8,16 or 32) : 4 194 304 * 32 = 134 217 728
-
If there is an alpha, add the same calculation, but count it as 8 bits : 134 217 728 + 33 554 432 = 167 772 160
-
Divide it by 8 : 167 772 160 bits / 8 = 20 971 520 Bytes(octets)
-
Then you can get the actual required Megaoctet required by dividing this by 1 048 576.
This might sound dull, but with this you can guess, now, that you can’t have that much texture on screen at the same time. Remember that even if you don’t see an object, if its render is not turned off (with Occlusion Culling for example), it still count in the VRAM usage.
Let’s say that you got a material with a shader that require a diffuse map with alpha for specular, a normal map and a self illumination alpha map. You decide to put all the texture to 2K.
The diffuse takes 5Mo, the normal (no alpha) takes 4Mo and the self-illum takes 5Mo (since it require an alpha, it’s a 32 bits most of the time) This end up costing you 14Mo for a single material.
How big is your asset? For a building that will take close to all the screen or at least 1/3 of it, you could consider spending 1/4 or 1/5 of the available resources toward its rendering. For interior, you could consider even more depending on how much (if any at all) you see the outside. As an example of the 512MB given previously, you could allow around 100Mo or 128Mo in materials for the building IF, as I wrote, it takes a lot on screen. This mean you can use like 7 or 8 materials to fill the interior wall, sealing, floor, etc.
When to separate or use multi-materials?
Using multi-materials (with material IDs) can be a life-saving when you wish for things to always be considered as a single object. On the other side, if you almost never see 2 parts of the meshes/prop/object/whatever and they don’t need something like skinning, you should consider separating both of them so that the 2 materials aren’t rendered by force, but because they are seen by the scene’s camera (rendering source). Doing so allow you to save on the general VRAM usage. But on the other side, if you see something plenty of times in the view in row, you might also consider using multi-material or joining multiple object on the same texture for the sake of reducing the VRAM usage peeks. (see it as turning on and off a light in real life. Keeping it on only when needed is great, but if you enter and quit the room constantly, you should just keep it light or you will broke it just faster. It’s relatively the same with textures and material called, but instead of breaking thing, it slow down the caching process.)
By the end, the best way of dealing with this is by testing.
As an additional note for the 2 pictures given by the original question, one of the reason why the first picture is highly detailed is due to its resolution being 3800x3800. Another thing to consider is how well you can manage the pixel illusion and amount of details. If you have small items or details on the house, you should consider actually making them in 3ds instead of trying to give the illusion of it. This way the textures doesn’t loose “quality” over such things because it seems off otherwise.