Understanding performance discrepancies across different iOS devices

So I’m making a bullet-hell shmup. Desktop is my primary platform, but I’m also planning to release it on iOS/Android so I’m building to those targets from time to time as I go. Right now, I’m experiencing a significant difference in performance on my iPad Mini (1st gen) vs. my iPhone 4. With tons of active bullets, I can sustain an easy 60 fps on the iPad, no problem. On the iPhone, the best I can manage is a very spiky 20-25.

The fact that the iPhone performs worse isn’t that big a surprise: the 4 is well-known to be an underpowered device (as opposed to the 4S, which is actually competitive), and the mini handily beats the 4’s hardware spec. So I’m not at all opposed to simply requiring a 4S or later; I’ve been doing a lot of profiling and optimizing, and I’m pretty sure the 4 is a lost cause. This isn’t a “help me make this game run on old iPhones” thread. :wink: But I am curious to dig into which hardware differences are most “to blame” for the discrepancy.

The two major pain points on the phone are physics simulation, and scripts’ FixedUpdate calls. Together these average around 2ms/frame on the iPad and even less on desktop, but on the phone they’re anywhere from 7-20ms baseline, and if I get a perf spike the physics sim alone can easily exceed 100ms for a brief period.

So let’s talk about what’s going on in each of those places:

Physics sim: Obviously this is mostly a black box, but as far as what’s in the physics world, we’re looking at, on average, 100 or so rigid bodies (bullets and enemies), each with a single sphere collider marked as trigger (in fact everything is a trigger, because I don’t care about collision resolution in this game, only that a collision occurred at all). My fixed time step is set at 0.02 (50 Hz) and since everything’s a trigger, I turned the solver iteration count all the way down to 1. It’s worth noting that because most of the rigid bodies are bullets, that means I’m adding to and removing from the scene at a pretty good clip. I’ve written a stack-based object recycler so I’m not actually instantiating and destroying everything, but simply enabling and disabling instead. Nevertheless, profiling on the phone pops up “Dynamic Collider.Create” on enable (with a non-trivial amount of CPU time), whereas that doesn’t even show up on the iPad or desktop. :face_with_spiral_eyes:

Script FixedUpdates: These are mainly the bullets, and certain weapons (responsible for spawning bullets… obviously) which need to fire rapidly enough that using a regular Update could introduce breaks in their bullet pattern. The bullets’ FixedUpdate is a single rigidbody.MovePosition call, with both the rigidbody and transform components cached. The only reason I even do this, instead of just applying an impulse at spawn-time and letting the physics sim handle the motion, is that I need to be able to scale the bullets’ velocities under certain circumstances depending on their team (if you’re at all familiar with Sine Mora, I’m more-or-less talking about its selective time-slow mechanic, here). The weapons do some logic, but its a minority portion of the overall FixedUpdate time, and not terribly worrisome.

From this page I compared the hardware specs of the iPad Mini and the iPhone 4. I highly doubt RAM or GPU are at issue here (unless Unity is running physics on some/all GPUs? Seems questionable on mobile in any case…) so I’m drawn to these two differences in particular:

  • Processor: 1 GHz dual-core (iPad) vs. 800 MHz single-core (iPhone, underclocked from 1 GHz)
  • Bus frequency: 250 MHz (iPad) vs. 100 MHz (iPhone)

Ultimately, what I’m wondering is whether this is more an issue of bus frequency (possible?), processor speed (questionable, with only a 200 MHz difference I wouldn’t expect the huge performance difference I’m seeing), or number of cores (seems most likely; can Unity run physics on a separate core if one is available?)

If I can get to the bottom of this, it’ll help me determine what other kinds of devices are likely/unlikely targets based on their hardware specs (i.e. if it’s a bus thing, then anything under a 250 MHz bus is probably on the no-go list, etc.)

Any thoughts?

P.S. If you’re a Unity dev and can answer some of these questions definitively, I’d love you forever and ever <3

Although it’s not directly related to the issues you mentioned, the GPU in the 4 is especially bad at alpha. Transparency (especially in large portions of the screen can and will grind your game to a halt. In my experience at least. Though, I’m not familiar enough with the hardware to give a real justification for this.

Mobile GPUs are optimised to draw each pixel once, and use tile rendering instead of z-buffer. Triangles are sorted in depth, and the hardware only draws the triangles at the front of this depth queue. When you throw semi-transparent geometry into the mix, this sorting takes longer, and more triangles have to be drawn.

If each pixel is only drawn once then this must mean that overdraw (in the traditional sense) is not an issue?

Hmm not sure. The GPU will assign triangles to each of the tiles. Then, for each pixel in each tile, the triangle which is at the front of the queue of triangles is determined. So, this makes me think that the cost of solving which surface is visible is still present, and hence the app reducing overdraw is still worthwhile.

Tile deferred hardware works with:

Opaque with depth (sorted and submitted)
Opaque with triangle submission order (it’s fine it’ll still cull)

Tile deferred hardware does nothing for:

Anything with a blendmode (any form of transparency)

In the case of a bullet hell shooter, all you need to do is render the bullets (and possibly everything if its a pixel art game) to a texture that’s much lower res. In the case of the iPhone4, you can get away with using a non retina resolution. This will easily get the speed back up in your case, providing script and physics are < 16ms on the iPhone4.

Also, fill rate isn’t just pixels drawn. It’s how expensive those pixels are, so check your shaders are uber optimised and opaque where possible.