Follow these steps to configure Arm Streamline Performance Analyzer to capture Mali GPU-related data:
Figure 11: Select device
This opens a new window:
Figure 12: Select counters
Once you have selected the device, the application, and the metrics to be collected, click on the start capture button.
This automatically starts the application and begins collecting the profiling data.
Make sure the application is running correctly on your Android device. After a few seconds, you can stop the capture process.
Wait until Streamline completes processing the data.
Switch to Mali Timeline view as shown below:
Figure 13: Mali Timeline Streamline
You might have to zoom into the data up to the maximum (500 us), as you are rendering a simple 3D object.
You can analyze two consecutive frames as shown below:
Figure 14: Two Consecutive Frames
Arm has worked with the Dawn team to optimize data uploading to GPU buffers for Mali GPUs.
Arm has implemented a Fast Path mechanism where the Vertex Queue starts processing in parallel while an earlier Fragment Queue is simultaneously being processed.
As you can see from the above picture, there is some overlap between the Fragment Queue of first frame, and the Vertex Queue of the consecutive frame.
This demonstrates that the application is hitting the Fast Path that Arm has implemented to optimize performance of Dawn for Mali GPUs.
The overlap is small as the application is rendering the same simple 3D object under a different orientation. You can extend the application to render complex objects with multiple Uniform Buffers. This demonstrates the overlap in more detail.
You can experiment with different counters in Streamline and also explore other CPU profiling data.