How to handle many highly detailed models in a mobile device ?

I am building a VR android application which will show a scene of a huge street in-front of a temple. This sides of this street has stalls lined up all the way till the temple. All these stalls have 3d meshes made in maya with quite a lot of detail.

Now, to put this in phone, I thought of making a different set of stall models made through images as the scene was getting very sticky with all the meshes. I plan to replace these fake-image-models with the real 3d models when they are inside a set distance from player and vice-versa.

I want to know how much useful is this method, or there is some better method?
Currently I’m not able to get very similar looking models (fake-image-model and real-3d-models).

All the stalls have similar but not same structure like74601-img-model.png
In image the left one is the fake-image-model I made using front, top, and side views of original mesh (right).

My advice, Is to right away not use such detailed meshes. I know you want your scene to look good, but if you’re working with a phone you are not going to be able to sustain that sort of rendering with such little processing power. There are however ways to make less detailed meshes look as good as their more detailed counter parts, and that is by using normal maps ( Second, have you considered instead of images just using a simple LOD (level of Detail) system to cut down on processing power. I know you don’t want to hear this. but if you want to be working with high res models and textures you may want to consider developing for an actual vr system like the Oculus or HTC Vive.