Any iPad performance optimization wisdom beyond combine everything/fillrate sucks?

Hey everyone, I’m working on a space shmup for iPad, and I’m trying to get more out of it. More bullets, enemies etc.
I’m looking for any wisdom on optimization, and I’ll list here what I have setup so far.

I’ve used SpriteManager to combine most objects. I’m down to five draw calls, one for the game objects, one for the planet in the background, one for the enemy mothership and one for all the sprite based text in the game.

The main atlas is PVRTC compressed (rgb 4bits) and the shader is a tweaked additive shader that doesn’t take a tint color.

Each object, bullet etc. has it’s own GameObject, behaviour script and sphere trigger/kinematic rigid body.
Would managing them all from one gameobject/script offer any speed increase?

Here’s a screencast of the game in the Unity editor, http://screenr.com/Pok
(the pauses are the video lagging not the game)
And a screenshot of the same level on the iPad:

That Venus level runs 28-50 fps on the iPad, depending on what’s going on. I really need the game to be more stable than that.

Anything might help, thanks.

Big textures also slow things down via bandwidth, not just fillrate.

Draw calls are not all that bad on the ipad or iphone. You can have 10-15 draw calls with no change in fps.

Do you use render to texture? That is bad.

Do you use GUI in any form? Thats bad!

Fillrate: now fillrate is an interesting beast. If you have 100 objects spread around the screen not overlapping, it will be several times faster to render than if they all overlap. It becomes increasingly more expensive the more alpha things share the same pixel, exponentially even.

But I noticed you said you had a gameobject with its own scripts for every single item. I think this is the problem, or part of the problem. If you’re doing bullets you want one gameobject handling them all.

Also lets look at the garbage collector. How much work is that doing at the moment since I suspect thats one of the things that can go bad. Can you describe the nature of your slowdown? is it chuggy? does it happen on and off? or is it all just dog slow? More info can help us diagnose.

Do a quick test, just draw them on screen with no script attached in a typical manner (ie not all overlapping one place, but spread out as you would expect in the game).

Hey quick reply, thanks!

I’m not using render to texture, I don’t use GUI except for that fps counter.

Okay I’ll try managing the bullets from one GameObject.

The games speed seems to be directly related to how many things are on screen, so it just feels slower but not chuggy/choppy. It does however skip every 9-10 seconds now that you mention it. I have a 13 second audio loop playing, maybe it has something to do with that?

I’ll do that test and post the results.

update:
Looks like managing them from one GameObject will help, hopefully that will do the trick. I’ll post an update when I implement that.

Nope, once you have blending on, it is irrelevant how your object overlap etc … blending is applied to the framebuffer and cost as much regardless of your alpha values ( completely transparent parts of your sprites are just as expensive as everything else so minimize your entire sprite pixel coverage ( tighter sprite boundaries))

How do you build?

The fastest setting in unity 3 should be:

  • Armv6 + Armv7
  • strip bytecode
  • fast but no exeption

inside the resulting appcontroller.mm you should adjust some values:
#define USE_DISPLAY_LINK_IF_AVAILABLE 1
#define kFPS 60.0 (Well for testing i use 60 or higher)
#define kAccelerometerFrequency 1.0 (this value is the frequenzy the accelerometer is used. if you don´t use it at all set it to 1)
and one more define that toggles oppengl es 2 or 1.1. Set that to 0.

there is some other values you can tweak, see the documentation on that

and yes, can you paste here some internal profiler readings? Just to play safe and optimize where it is needed 8).

Sorry for hijacking the thread, but I have a question about this:
In my new menu-system, I use one 2048x2048 image for everything on iPad and retina devices (1024x1024 version on older devices). I made one big image so it’s easy to skin and so it all batches - and I noticed it is a bit slow on my iPad but not on my iPod Touch 4G which I thought had similar hardware, just a different resolution. Would splitting it up into 4 1024x1024 images help a lot? It’s quite a bit of work (remapping coordinates etc) so I’d like to know if it makes much difference before I do this.
Here’s a web demo (with temporary graphics I found online), in 480x320 resolution : http://mudloop.com/menu_system/

No, no :slight_smile: It won’t help at all… it doesn’t care if its on another texture or not, its all it having to read that many pixels. I don’t think you can do much more to speed it up except use 16 bit textures if possible in that particular instance. It might help if you are generating mipmaps too.

The problem is the iphone 4g has the same fillrate as the 3GS and you’re on retina so I doubt its the actual texture size causing the problem, just you’re drawing more with the same hardware. Although… you could try the 1024 texture on 4g just to see if its a texture fetch issue

You mis read what I said. Thats not what I meant at all.

  1. if you have 100 sprites with alpha 0 spread around the screen;
  2. it will be faster than 100 sprites with alpha 0 in the same place

That is because the hardware will re-sample every single pixel it overlaps

Thanks for your response. But on my 4G it works smoothly (60fps), just not on my iPad, which I think is weird - the iPad doesn’t have that many pixels more than the iPod Touch 4G. It’s not a huge perfofmance loss though, so I might be able to get it back with some other optimizations.

I read a lot about splitting textures too, I just don’t understand it. The ipad and iphone 4 are not worlds apart in hardware. If you could find out why, that would be super cool :slight_smile:

its unhappily not that wierd smag

you can redraw the screen 3-4 times per frame pixelwise on the iphone 4 / itouch 4g (4x more pixel than 3GS) but only 2-3 times on ipad (5.2 times more pixel than 3GS), so if you run fine on the 4th gen, you might still run close to the edge of the iPad and hit the border from time to time in which case the performance drops significantly (keep in mind, unlike drawcall limitation which is a “relative limitation”, the fillrate is absolute: Hit the border and your are “killed”, not “hit the border and it degrades” because you have a fixed amount of pixels you can render per second, this results in a fixed amount of pixels you can render per frame which you get by dividing the amount of pixels / desired FPS)

To overcome such overdraw kills as seen here, you will not be able to avoid OpenGL ES 2.0 and using pixel shaders that combine the stuff prior to rendering it on a RenderTexture and/or the commonly used optimized meshes that represent your object in a tighter form for situations where you use a lot of alpha to “omit parts from rendering” ie cut holes with alpha. In such a case the mesh should cut that part physically as well. You can afford thousands of wasted polygons on 4th generation but especially on iPad, but you can not afford wasted blending pixels :slight_smile:

Also, ensure that you only blend stuff that needs blending. Don’t blend background objects for example. if something is always the farthest back thing, fill the texture up with black instead of alpha if the backdrop is black and disable blending, that safes you considerable amounts of fillrate

Fillrate problems are very hard to tackle.
As far as 2d rendering , beside using compressed images ( which generally is not possible with pixel art) the only thing you can do is to go away from rectangular sprites and use non-rectangular tightly bound sprites.

This works well for sprites with irregular shapes where majority of the sprite is solid only outline is transparent.
Here is an example :

The idea is to split your sprite in two parts : solid and transparent and define your own vertex coordinates for both parts.
http://www.warmi.net/tmp/sprite1.png ( the white part is completely solid while the gray part is transparent)

It complicates rendering a bit and trades fillrate for vertex processing time but if you have fillrate problems - this sort of approach can double your framerate.

Nice tips Warmi. I agree, good tips.

Special attention must be made with ui elements in 3d games. It becomes really easy to just waste huge amounts of fill rate on the user interface part.

I notice in the above demo movie we see big overlays in the middle - I assume these are large transparent quads being overlaid in the same place? This is a “worst-case” scenario for the ipad. If you could make any big images which have a lot of alpha, into thin rotated quads which don’t burn all your invisible (alpha 0 on the texture is drawn) fill-rate you probably would solve this issue right away.

I see at least 4 large quads in the middle of the screen, these take up a quarter of the screen. Each quad thats drawn on top of each other does not mean 4x the fillrate it means over 8x the fillrate as the topmost quad will sample the pixels of the ones under that, and so on for the one under that (you get the idea). Either split all those images up into polygon shapes or find a way to combine them into one quad and just animate it via a sprite strip.

To sum up: the ipad is excellent at drawing masses of polys with no transparency, but quite bad at drawing any transparent polys which are over other transparent polys.

Please test your project without the 2 purple things and white rune in the middle, to see if that was the case.

Okay wow, very interesting!

@marjan
I am building armv6, stripping disabled(don’t have pro) and slow and safe.
Changing it to armv6-armv7 / fast but no exceptions does help a bit, the same scene runs consistently at 40-45fps
thanks, is there a general rule about using fast but no exceptions? It sounds unsafe, but what does it really mean?

These are my settings:
#define USE_OPENGLES20_IF_AVAILABLE 1
#define USE_DISPLAY_LINK_IF_AVAILABLE 1

//#define FALLBACK_LOOP_TYPE NSTIMER_BASED_LOOP
#define FALLBACK_LOOP_TYPE THREAD_BASED_LOOP
//#define FALLBACK_LOOP_TYPE EVENT_PUMP_BASED_LOOP

#define ENABLE_INTERNAL_PROFILER 1
#define ENABLE_BLOCK_ON_GPU_PROFILER 0
#define BLOCK_ON_GPU_EACH_NTH_FRAME 4
#define INCLUDE_OPENGLES_IN_RENDER_TIME 0

// — CONSTANTS ----------------------------------------------------------------
//

#if FALLBACK_LOOP_TYPE == NSTIMER_BASED_LOOP
#define kThrottleFPS 2.0
#endif

#if FALLBACK_LOOP_TYPE == EVENT_PUMP_BASED_LOOP
#define kMillisecondsPerFrameToProcessEvents 3
#endif

#define kFPS 60.0
#define kAccelerometerFrequency 0.0

@Alexey
Here are some frames from the internal profiler while I play the same venus level:
iPhone Unity internal profiler stats:
iPhone Unity internal profiler stats:
cpu-player> min: 17.4 max: 27.8 avg: 23.0
cpu-ogles-drv> min: 0.4 max: 2.2 avg: 0.5
cpu-waits-gpu> min: 0.1 max: 2.4 avg: 0.3
cpu-present> min: 0.3 max: 0.7 avg: 0.3
frametime> min: 18.8 max: 29.3 avg: 24.5
draw-call #> min: 5 max: 5 avg: 5 | batched: 0
tris #> min: 1514 max: 1514 avg: 1514 | batched: 0
verts #> min: 3028 max: 3028 avg: 3028 | batched: 0
player-detail> physx: 4.0 animation: 0.0 culling 0.0 skinning: 0.0 batching: 0.0 render: 2.2 fixed-update-count: 2 … 3
mono-scripts> update: 14.8 fixedUpdate: 0.0 coroutines: 1.0
mono-memory> used heap: 1159168 allocated heap: 1536000 max number of collections: 0 collection total duration: 0.0

iPhone Unity internal profiler stats:
cpu-player> min: 15.8 max: 33.0 avg: 24.8
cpu-ogles-drv> min: 0.4 max: 2.8 avg: 0.6
cpu-waits-gpu> min: 0.1 max: 0.4 avg: 0.2
cpu-present> min: 0.3 max: 1.4 avg: 0.4
frametime> min: 19.6 max: 35.6 avg: 26.5
draw-call #> min: 5 max: 5 avg: 5 | batched: 0
tris #> min: 1514 max: 1514 avg: 1514 | batched: 0
verts #> min: 3028 max: 3028 avg: 3028 | batched: 0
player-detail> physx: 4.8 animation: 0.0 culling 0.0 skinning: 0.0 batching: 0.0 render: 2.3 fixed-update-count: 2 … 4
mono-scripts> update: 15.4 fixedUpdate: 0.0 coroutines: 1.1
mono-memory> used heap: 1216512 allocated heap: 1536000 max number of collections: 0 collection total duration: 0.0

iPhone Unity internal profiler stats:
cpu-player> min: 20.7 max: 36.1 avg: 26.7
cpu-ogles-drv> min: 0.4 max: 0.5 avg: 0.4
cpu-waits-gpu> min: 0.1 max: 0.3 avg: 0.2
cpu-present> min: 0.3 max: 1.6 avg: 0.4
frametime> min: 22.2 max: 38.1 avg: 28.3
draw-call #> min: 5 max: 5 avg: 5 | batched: 0
tris #> min: 1514 max: 1514 avg: 1514 | batched: 0
verts #> min: 3028 max: 3028 avg: 3028 | batched: 0
player-detail> physx: 4.8 animation: 0.0 culling 0.0 skinning: 0.0 batching: 0.0 render: 2.2 fixed-update-count: 2 … 4
mono-scripts> update: 16.8 fixedUpdate: 0.0 coroutines: 1.2
mono-memory> used heap: 1302528 allocated heap: 1536000 max number of collections: 0 collection total duration: 0.0

Which is faster, a multiply blend mode or an alpha based transparency? Either way I gather that opaque shaders are the way to go for speed on the iPad. So I will convert the background planet and possible the alien mothership in the center to an opaque shader.

@hippocoder
Unfortunately, the center is already rotated quads:

Running the game without the planet, and without the center art didn’t offer too much benefit. However, after destroying the turrets the fps increased to ~60fps with a moderate number of enemies on the screen. Later, when more enemies were on screen, the fps dropped again.
I am tinting some of the sprites with SpriteManager, which I believe sets the vertex color. Does this slow the rendering much?

I don’t know much about shaders, but I know they can make a hell of a difference, so maybe this will help you :

Check out this thread for a great shader without vertex coloring: http://forum.unity3d.com/threads/40868-Transparent-but-no-alpha?p=261138&viewfull=1#post261138 (thanks Jessy!)
Also look here if you need a shader with vertex colors : http://forum.unity3d.com/threads/59205-Shader-help?highlight=shader (thanks again Jessy!)

I can’t speak for unity but they didn’t affect performance at all in C++ / native iphone development, but then I didn’t use any lights at all, and just used them to colour things. Try out smag’s suggestions though and report back, as I would be interested to see how you got on.

Did you try removing all traces of GUI? I hear stories that GUI is a nightmarish hog for iOS.

Cool, thanks Smag.
I took the shader Jessy made and edited it a little so it is additive. It helped, and the game rested around 40fps with safe and slow/armv6 and stayed close to 50fps with the fast but no exceptions/ armv6+armv7. Removing the frame counter gui element actually seemed to help, but if it did it was only a few frames. (I logged the fps with Debug.Log instead)

I also turned off blending in the planet shader, which I think helped a lot.

I’m still going to batch the handling of the enemy bullets, so I’ll post if that helps.

Here’s the modified additive shader:

I borrowed a lot from unity’s additive shader, and removed alpha.
If anyone knows how to make it faster, please let me know!

Ok, first on stability front:

and your avg frametime is 25-30ms, until you do smth about it - define kFPS as 30 - it will be smoother
Also,

while

means that you are most likely bound by gpu - your Update time alone prohibit you to run on 60 fps (16ms per-frame)

Thanks Alexey, I did notice much more stability when I set kFPS to 30. I’ll try to take some load off of my update step and see where that leaves me.

I’m also doing a shmup style game on the iPad.

Currently, most of the art content in the game is meshes + normal map + spec map, and I have a full screen animated image deformation going on the in the background. WIthout doing much optimizing at all, the game runs at a constant 30 fps.

A few notes -

  1. Avoid doing anything with sprites on the iPad, if you can avoid it at all. The iPad only has fill rate issues if you are using transparency. Therefore, if you converted your ships to meshes made out of lines, I can almost guarantee you will be able to fill the screen with ridicolous amounts of enemies without it affecting the framerate. The same thing goes for bullets, try to think of a way to have them made out of fully opaque meshes( like having spinning diamonds, octagon shaped bullets instead of circles, etc ).

  2. The built in shaders are great for prototyping, but are made for general purpose use. Unity can’t possibly know people are going for, so they can’t cut corners in their shaders. I’ve been able to get large performance increases just by doing a naive rewrite of shaders, and I am no shader optimization wizard.

  3. Fragment processing on iOS devices tends to be slow. For the full screen image deformation effect I have going in the background, when I optimized it down to six instructions instead of 12, it nearly doubled the frame rate. Also, whereas there is little point in using lookup tables on desktop GPU’s, since they can generally do the math faster than the lookup, the same is not the case on the iOS. If you are doing expensive operations on the iPad’s GPU, like multiple trig functions, you might want to investigate encoding a lookup table to a texture.

  4. It’s been mentioned already, but the iPad locks rendering to the refresh rate, so your game is either going to run at 30fps or 60fps. My game often times runs at 60fps, but the jump from 60 to 30fps is jarring, so I lock it at 30fps.

Just to conclude, is that the main thing affecting your frame rate is your art assets. No matter how much optimization you do, you are going to hit a wall with sprites on the iPad. If you really want blistering fast frame rates with room to spare, I would recommend redoing your art assets as fully opaque meshes.