Unlocking the power of the GPU for embedded browsing
June 22, 2018
Story
Browsers need to redraw the screen whenever any part of the page changes, while simultaneously minimizing the amount of work that the CPU has to do.
Graphics processing units (GPUs) are increasingly being included in SoCs, which drive embedded systems and connected consumer electronics. The GPU can be thought of as an extra processor that’s suited to display rendering, and with user interfaces often forming a primary indicator of overall product quality, it makes sense to make the best possible use of it. Consumer electronics often rely on an HTML browser to provide the presentation layer upon which the UI and other applications are built. Here I’ll look at how some browsers use the GPU and contrast this with a more focused approach.
Browsers need to redraw the screen whenever any part of the page changes, while simultaneously minimizing the amount of work that the CPU has to do. They do this by tracking all areas of the screen that require updating, then repainting just those areas and the parts of any other onscreen elements that overlap them.
Most browsers use the CPU to paint the representation of these elements into pixmaps. Typically, the next frame to be displayed is created by taking the previous frame and overwriting the changes, as shown in Figure 1. If available, the GPU is used to combine the previous frame and partial pixmap data into the next frame freeing the CPU for other tasks.
To improve performance, many browsers use accelerated compositing, a technique of grouping the parts of a page that don’t interact in separate layers and manipulate these independently. This technique is essentially a trade-off between memory and CPU use. When anything changes, the GPU composites the pixmaps representing each layer into the frame buffer, but the CPU is still used to paint the contents of those pixmaps. As shown in Figure 2, the composition of layers always covers the entire screen, so there’s no need to access a copy of the previous frame.
Using the GPU
Although GPUs offer considerable flexibility, browsers have been relatively slow to embrace their full benefits. The interrelated nature of browser design makes it complex to redesign a core component, such as rendering, to exploit vastly different hardware capabilities. As a result, most browser designs still treat the GPU as an enhanced blitter and use it mainly for accelerated compositing.
Browser providers have been looking to move beyond the simple “GPU as a blitter” scenario for some time. Using the GPU, as opposed to the CPU, to handle painting is known as GPU rasterisation. Some browsers include the option to enable GPU rasterisation using extended graphics libraries that pass the painting tasks to the GPU. These libraries cater for a wide range of usage scenarios and hence offer flexible, generic APIs. Unfortunately, the flexibility of these APIs is overcomplicated for the use cases required by HTML and this leads to sub-optimal use of the GPU, resulting in reduced rendering performance.
GPU performance comes from being able to efficiently execute a large volume of similar operations batched together, feeding pipelines to ensure they’re optimally filled. The flexibility provided by these extended graphics libraries often leads to a failure to keep the GPU’s pipelines full, which significantly reduces its effectiveness.
To get the most out of the GPU, its pipelines must be kept full; focusing solely on the requirements of HTML and matching those to GPU capabilities holds the key. The characteristics of HTML lend themselves well to GPU acceleration. Elements are essentially rectangular, naturally grid aligned, and rarely use anti-aliasing. Since GPUs process triangles, two can be used for each rectangular HTML element. The set of graphics primitives required for HTML are also relatively small which means that an HTML-specific GPU accelerated graphics API is a realistic goal.
The GPU rasterisation process happens completely asynchronously to the CPU. This means that the CPU can start processing subsequent tasks such as scripting and layout before the painting task is complete. GPU rasterisation also changes the optimization-reward balance. On the CPU, performance optimizations are achieved by only updating those areas of the screen that have changed. Whereas on the GPU, it’s optimal to follow the lead of the gaming industry and update the entire screen every frame. While it may seem counter-intuitive to re-paint every pixel on each frame, in most cases, the processing cost of redrawing the whole screen is lower than cost of the calculations which would be needed on the CPU to manage partial updates.
Ekioh has used the techniques described above in its new multithreaded HTML browser and achieved successful results. In graphics heavy applications, focused use of the GPU was found to be more than twice as fast as the more generic library-based approach. Furthermore, because GPU rasterisation doesn't require the large quantities of cached pixmaps used for accelerated compositing, a large memory savings (over 45 Mbytes) was observed in some cases. As screen resolutions increase, these memory savings will become even more significant.
The focused use of the GPU is therefore key to driving down costs without sacrificing performance. Using this approach, next-generation embedded silicon will be capable of delivering browser-based 4K applications and UIs on mass-market consumer electronics products without any drop in performance.