

# Powering HEVC Video Experience with MediaTek Hardware Solutions

MediaTek White Paper

November 2014

This white paper summarizes Mediatek's best-in-class HEVC hardware solutions, our technical innovation, and the benefits users enjoy with these solutions.

© 2014 MediaTek Inc.



## **1** Introduction

The introduction of the next generation video compression technology, H.265/HEVC [2], has opened up a multitude of opportunities for the end-to-end delivery over the mobile or home entertainment devices. In a nutshell, the advantages of HEVC over its predecessors (i.e. H.264/AVC) are two-fold: equivalent quality but half the data rates, and the ability to deliver 4K/Ultra-HD resolution video in Figure 1 below. While providing good video quality as well as coding efficiency, HEVC/H.265 increases the computational complexity of encoder by 5.2x and decoder by 2.1x compared to H.264, which presents challenges for mobile devices that operate within a very limited power envelope. In addition to the more pixel (high-resolution, 4K) video, better pixels (high color bit depth) provide a better perceptual experience. In Figure 2 below conventional video uses 8-bits per color channel (or 24-bits per pixel) for the video delivery applications. Although this amounts to more than 16 million colors, it still represents a fraction of the colors we perceive in the real world. Moreover, according to ITU-R BT.1361 and BT.2020 recommendations [1], HEVC compression at 4K resolutions requires 10-bit color. HEVC compression further increases the data bandwidth by 25%. Many technical papers describe HEVC software solutions. An HEVC hardware solution capable of 10-bit color and 4K resolution however remains a less known feature in mobile and home entertainment devices.



Figure 1. Full-HD vs. 4K Video Resolution





8-bit



#### Figure 2. 8-bit vs. 10-bit Video Quality

Mediatek-inside products that take full advantage of our HEVC hardware processor are able to incorporate high-resolution, high color-depth, low-power video experience, and convincing video quality. Our comprehensive HEVC solution, with over 13 video standards, provide the complete interoperability in different systems and countries when streaming a video over an open platform. The features and wide acceptance of Mediatek's HEVC solution make it applicable to a wide variety of applications and markets, including mobile phone, tablet, TV, Blu-ray player and media-box.

### **2** SOLUTIONS

Mediatek's video co-processor solutions [3–12] include the world's first HEVC hardware solution [3] to deliver breakthrough power and cost efficiency. This solution enables 4K resolution real-time video playback while retaining long battery life.

In Mediatek's HEVC solution, we deliver:

- Revolutionary video architecture
- Exceptional video performance
- Comprehensive video standards

#### 2.1 Revolutionary Video Architecture

A generalized video pipeline has come a long way, but only reaches its maximal operating speed under a specific manufacturing process. To further improve the processing speed for high resolution video requirements, multiple cores are proposed which are capable of encoding or decoding higher resolution or frame rate by exploiting multiple instances of video cores and reconfiguring shared memory management unit (MMU) connections. Figure 3 depicts a video system block diagram with 3-stage pipeline and 1-core configuration. First, the video stream feeds into variable length decoder (VLD) via 128-bit system bus and is decoded as headers for sub-sequent decoding. Then, residual streams are decoded by VLD/IS/IQ/IT and added into predicted data from intra prediction (IP) or motion compensation (MC). De-blocking filter (DF) and Sample Adaptive Offset (SAO) filter receive the reconstruction of residual and predicted data so as to further improve the system performance.



Moreover, a weighted MMU is designed to facilitate the data access between video hardware accelerator and external memory under a limited bandwidth requirement.

Mediatek's HEVC hardware is the world's first multi-core video architecture to improve the processing speed and to lower the external bandwidth. This revolutionary architecture includes two techniques: Adaptive Coding Unit Balance (ACUB) and Data-Sharing Wave-front Dual-core (DSWD) techniques.

The ACUB architecture optimizes the pipeline buffer cost and processing cycles in video decoding system. The processing cycles are optimized by analyzing and balancing different number of N×N pipeline, as shown in the gray area of Figure 3. The proposed ACUB optimizes pipeline buffers and reduces the processing cycles, which conventional architecture does not perform.

The Data-Sharing Wave-front Dual-core (DSWD) technique is used to efficiently lower the working frequency. Instead of single-core video decoder, DSWD decodes different rows by dual decoding cores alternatively, as shown in Figure 4. Furthermore, a data-sharing memory is used to share necessary information for decoding the next row by referring to the upper row between two cores. Both cores have the individual interface to weighted MMU and DRAM controller and share the same memory. The DSWD technique adopts the parallelism on a block of pixel row level shown in Figure 4. In Figure 5(b), the proposed DSWD incorporates two cores and each core is able to run at a lower frequency, leading to overall 65% required frequency reduction. That is, 350MHz of working frequency enables 4K×2K 60fps HEVC decoding instead of 1GHz of frequency, thereby alleviating the speed requirements and lowering the power consumption in video playback scenario.

The system block diagram below shows the CU (Coding Unit), ), PU(Prediction Unit), TU(Transform Unit) and CTU(Coding Tree Unit) defined by H.265/HEVC standard [2].



√ 介 128-bit Bus

Figure 3. System Block Diagram





Figure 4.Dual-Core Parallel Operation

(b)



Figure 5. Dual-Core Block Diagram and Architecture

#### 2.2 Exceptional Video Performance

(a)

First and foremost, Mediatek's HEVC hardware is designed for video experience leadership. Based on our HEVC architecture, it achieves up to 4Kx2K 60fps 10-bit video playback over mobile devices while maintaining power efficiency in a lower level.

10-bit per sample video allows for wider color spaces and is required for the BT. 2020 [1] color space that will be used by Ultra-HD applications. HEVC/H.265 Main-10 profile defines the video compression process for 10-bit per pixel scenario. However, storing the 10-bit pixel into an 8-bit byte-aligned DRAM space is challenging. Figure 6(a) demonstrates a straight-forward 10-bit footprint over the 8-bit DRAM space. The white squares indicate un-used bit space in the memory. To reduce the required space in 10-bit footprint, a smart pixel storage (SPS) scheme is proposed to store each 10-



bit sample in a compact manner, as shown in Figure 6(b). Four 10-bit pixels are stored into 5 address words. Compared to a straight-forward design, the SPS reduces the WORD address number from 8 to 5. The proposed SPS method adopts a simple address calculation and smart packing, and can be easily extended to 12-bit or 14-bit per sample. Considering an HEVC-compressed stream with 4Kx2K resolution, 420 chroma-sampling and 10-bit YUV sample, the proposed 10-bit SPS method reduces the DRAM space from 300MB to 187MB, leading to 37.5% of DRAM space saving.



### 2.3 Comprehensive Video Standards

In the TV industry, different countries adopt different international video standards as well as regional ones, such as AVS in China markets. Therefore, there is an increasing demand for a wide spectrum of video contents, which requires comprehensive video standards. To enhance the area efficiency, a multi-standard architecture including MPEG-2/4, VC1, WMV-7/8/9, RM 8/9/10, AVS, VP 6/8, H.264 and HEVC is presented, which exploit the resource sharing techniques in different video standards. We re-optimize the de-blocking filter and motion compensation and therefore achieve remarkable hardware area reduction.

An in-loop de-blocking filter is standardized in not only H.264/AVC but H.265/HEVC decoding. We combine filtering processes so as to smooth the edges caused by blocking artifacts. We combine filtering processes on H.264, HEVC and other format decoding, and modify post-loop filter. Moreover, we develop a configurable interpolation which receives weighting metrics for different standard specifications without changing the hardware prototype.

The interpolator performs a major task of integrating motion compensation in different standards. MPEG and VC1 adopted bilinear and bi-cubic interpolation while H.264 and HEVC exploited 6-tap and 8-tap filter for half-pel interpolation respectively. The interpolation processes are composed of



arithmetic operations and shift registers which can be reused for different standard requirements. Figure 7 depicts, for simplicity sake, four categories (MPEG, WMV/VC-1, H.264 and HEVC) of video standard area breakdown. 28% of area reduction can be gained by exploiting multi-format de-blocking filter and motion compensation.



Figure 7. Area Reduction for Multi-Standard Video Decoder

## **3 BENEFITS**

To summarize, here are four key benefits that Mediatek's HEVC hardware solutions provide:

• Improves Video Experience Significantly: Mediatek's HEVC-enabled products allow 4K video playback so as to offer a huge leap forward in quality and experience incredible quality details when equipping 10-bit display panels.



- Ease of Integration Enables Deeper Penetration into Different Markets: Being able to enjoy the video and lengthen the battery life on Mediatek's HEVC-enabled devices makes it possible for devices to be used for a greater number of video applications (e.g. video telephony, recording, streaming, playback, broadcasting, etc) and markets (e.g. DVD/Blu-ray player, digital TV, smart-phone, media-box, automotive entertainment, etc).
- **Opens Up New Markets and Applications:** With the Mediatek's HEVC-enabled devices, users get the best of real-world visualization. These devices are also easy to integrate into any panel-equipped component, such as realizing wearable devices in Internet-of-Things (IoT) applications.
- Makes 4K/Ultra-HD services more feasible for both recording and playback: Mediatek's HEVC hardware solution is easily integrated into different devices for the end-to-end delivery. It's beneficial to video contribution and distribution in the social networking or video conference applications. Mediatek continues to shape the future of 4K/Ultra-HD services for years to come.

## 4 CONCLUSION

Mediatek's experience in video hardware implementations across multiple standards and scalable architecture, and best in class hardware processor demonstrates our unique position among hardware vendors for a market ready HEVC processor.

## **5** References

<sup>1</sup> "Parameter values for ultra-high definition television systems for production and international programme exchange," *ITU-R BT.2020,* August 2012.

<sup>2</sup> B. Bross,W. J. Han, J. R. Ohm, G. J. Sullivan, and T.Wiegand, "High Efficiency Video Coding (HEVC) text specification draft 10 (for FDIS & Consent)," Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, Jan. 2013.

<sup>3</sup> Chi-Cheng Ju, Tsu-Ming Liu, *et. al.*, "A 0.2nJ/pixel 4K 60fps Main-10 HEVC decoder with multiformat capabilities for UHD-TV applications," *IEEE European Solid-State Circuits Conference (ESSCIRC'14)*, pp. 195-198, Sept. 2014.

<sup>4</sup> Chi-Cheng Ju, Tsu-Ming Liu, *et. al.,* "A 4Kx2K@60fps Multi-Standard TV SoC Processor with Integrated HDMI/MHL Receiver," *in Symp. VLSI Circuits Dig. Tech. Papers (VLSI'14)*, pp. 1-2, June 2014.

<sup>5</sup> Chi-Cheng Ju, Tsu-Ming Liu, *et. al.,* "A 1.94mm<sup>2</sup>, 38.17mW dual VP8/H.264 Full-HD encoder/decoder LSI for Social Network Services (SNS) over smart-phones," *IEEE Asian Solid-State Circuits Conference (ASSCC'12)*, pp. 13-16, Nov. 2012.

<sup>6</sup> Chi-Cheng Ju, Tsu-Ming Liu, *et. al.*, "Area and Memory Efficient Architectures for 3D Blu-raycompliant Multimedia Processors," *IEEE International Conference on Multimedia Expo. (ICME'12)*, pp. 776-781, July 2012.

<sup>7</sup> Chi-Cheng Ju, Tsu-Ming Liu, *et. al.*, "A 775uW/fps/View H.264/MVC Decoder Chip Compliant with 3D Blu-Ray Specifications," *IEEE International Symposium on Circuit and System (ISCAS'12)*, pp. 1440-1443, May, 2012.

<sup>8</sup> Chi-Cheng Ju, *et. al.*, "A 363-μW/fps Power-Aware Green Multimedia Processor for Mobile Applications," *IEEE International VLSI Symposium on Design, Automation and Test (VLSI-DAT'12)*, pp. 1-4, Apr. 2012.

<sup>9</sup> Chi-Cheng Ju, *et. al.,* "A full-HD 60fps AVS/H.264/VC-1/MPEG-2 video decoder for digital home applications," *IEEE International VLSI Symposium on Design, Automation and Test (VLSI-DAT'11),* pp. 1-4, Apr. 2011.

<sup>10</sup> Chi-Cheng Ju, *et. al.,* "A 658KGates e-Streaming Video Decoder for Digital Home Applications," *IEEE Asian Solid-State Circuits Conference (ASSCC'09),* pp. 33-36, Nov. 2009.

<sup>11</sup> Chi-Cheng Ju, Tsu-Ming Liu, *et. al.,* "A Multi-Format Blu-ray Player SoC in 90nm CMOS," *ISSCC Dig. of Tech. Papers(ISSCC'09),* pp. 152-153, Feb. 2009.

<sup>12</sup> Chi-Cheng Ju, Tsu-Ming Liu, *et. al.*, "A 125Mpixels/sec Full-HD MPEG-2/H.264/VC-1 Video Decoder for Blu-ray Applications," *IEEE Asian Solid-State Circuits Conference (ASSCC'08)*, pp. 9-12, Nov. 2008.