This update won’t be as visually pleasant as other updates.
I have spent the last week learning all about the mobile hardware and particularly learning a lot about the GPU. The hardware is very interesting and at first it was difficult to find direct answers to simple questions but eventually I found some good articles from ARM.
Ok, what details might be interesting to you? The GPU is a Mali T-760 MP8, part of the Midgard family. It is technically clocked at 700 Mhz (or something close) but that doesn’t matter because you’ll cook the machine too fast so you should run it at something closer to 400 Mhz or slower. The GPU has 8 cores (that’s right just 8). By comparison my 3 year old desktop has 1300 cores. The mobile GPU has 2 levels of cache. Each core has two level 1 cache’s 16kB each and the whole GPU has 2MB shared level 2 cache, 4-way associative, 64 byte cache lines.
Each core contains 2 ALUs (arithmetic), 1 LS (load/store vertex attributes, write out varyings, read in varyings), 1 TX (bilinear texture reader), and 32 bit access to memory (DDR). Theoretically each ALU can do 17 32bit floating point operations a clock. The break down to get that 17 is dot product (4 mults, 3 adds), V4 mult (4 mults), V4 add (4 adds), scalar add (1 add), scalar multiply (1 mult): for a total of 17 operations. In practice that won’t happen. The LS can do one memory access per clock, e.g. read one vertex attribute or read one varying. The TX unit can read one bilinear filtered texel per clock. For memory access you get 32 bits per core, for a total of 256 bit data (32 bytes per clock). Multiply these by your clock rate and divide by your FPS and you’ll see how many theoretical operations you get per frame.
The GPU is a massively mult-threaded processing engine, so a bunch of threads (up to 256) will be running at the same time to chew through your vertex shaders and fragment shaders.
There are many more details I could share but instead I’ll just leave you with good references.
Most of the gory (and awesome) detail I gained was from this article series:
There are some great debugging and profiling tools as well.
Next time hopefully I’ll have some pretty pictures to share. For now I’ll just show you a screenshot of one of ARM’s great profiling tools.