This document has been reviewed as part of the transport area review team's ongoing effort to review key IETF documents. These comments were written primarily for the transport area directors, but are copied to the document's authors and WG to allow them to address any issues raised and also to the IETF discussion list for information. When done at the time of IETF Last Call, the authors should consider this review as part of the last-call comments they receive. Please always CC tsv-art@ietf.org if you reply to or forward this review. Summary ---------- Overall, this document seems more focused on the requirements for development of codecs such as H.264 than on the requirements that would enable widescale adoption of a next generation codec. In practice, requirements reducing the fragmentation of implementations (such as a requirement that a compliant decoder be able to decode anything that an encoder can send) have proved critical to success, yet this document omits them. Also, the document appears focused on video technology as of 4-5 years ago, rather than the technology used in today's streaming and video conferencing services where support for scalable video coding (and advanced modes such as K-SVC) has become critically important. Issues ------ Section 2.1 Video material is encoded at different quality levels and different resolutions, which are then chosen by a client depending on its capabilities and current network bandwidth.... o Scalability or other forms of supporting multiple quality representations are beneficial if they do not incur significant bitrate overhead and if mandated in the first version. [BA] The words "are beneficial" suggests that support for scalability is optional. In practice, support for both temporal and spatial scalability has proved to be important since it has been widely adopted in dynamic streaming applications, in which the video material to be encoded once and played back at framerates, resolutions and quality levels dependent on network conditions and the characteristics of the endpoint devices. Section 2.5 [BA] This section does not mention support for screen content coding tools. Given that these tools are so effective in reducing the bandwidth required for application sharing (compression of 75 percent is common), it is hard to imagine a next generation codec that would not support screen content coding. Section 2.6 Support for K-SVC modes has turned out to be important for game streaming, since these modes reduce delay spikes that would otherwise result from generation of a key frame. Since K-SVC modes have unusual characteristics (e.g. frames within a single temporal unit may not share the same temporal ID), they impose unique requirements on a video codec design. 3.2.3. Complexity: o Feasible real-time implementation of both an encoder and a decoder supporting a chosen subset of tools for hardware and software implementation on a wide range of state-of-the-art platforms. [BA] This sentence seems to imply that the tools supported in hardware and software might be different. In practice, this is problematic, particularly if support for some tools can be omitted at lower profile levels, because application developers then need to handle the disparities between tools support in different implementations. 3.2.4. Scalability: o Temporal (frame-rate) scalability should be supported. [BA] In practice, a next generation video codec also needs to support spatial scalability as well as temporal scalability. 3.2.5. Error resilience: o Error resilience tools that are complementary to the error protection mechanisms implemented on transport level should be supported. o The codec should support mechanisms that facilitate packetization of a bitstream for common network protocols. [BA] Both of these points require more elaboration. What error resilience tools as are being referred to, and what mechanisms are perceived to facilitate packetization? Is the latter referring to video codec syntax (e.g. NAL unit structure?). o The codec should support effective mechanisms for allowing decoding and reconstruction of significant parts of pictures in the event that parts of the picture data are lost in transmission. [BA] Not sure what this is referring to either. 3.3.2. Scalability: o Resolution and quality (SNR) scalability that provide low compression efficiency penalty (up to 5% of BD-rate [12] increase per layer with reasonable increase of both computational and hardware complexity) can be supported in the main profile of the codec being developed by the NETVC WG. Otherwise, a separate profile is needed to support these types of scalability. [BA] Mixing support for scalability with profile negotiation leads to implementation balkanization that dramatically increases the complexity of application development. A better principle is that a compliant decoder should be able to decode any bitstream that an encoder can send.