Minutes of the Audio/Video Transport Working Group Reported by Colin Perkins The Audio/Video Transport working group met twice at the 42nd IETF in Chicago. The major items of discussion were the revision of RTP and the Audio/Video profile for advancement to draft standard, multiplexing of RTP streams and transport of MPEG4 streams using RTP. The meeting opened with a review of the status of the working group documents. The RTP header compression scheme and the payload format documents for JPEG, BT656 and H.263+ video are now either awaiting IESG approval or are in last call. The payload format for bundled MPEG streams has been published as RFC2343 (experimental). The Options for Repair of Streaming Media document has been published as RFC2354 (informational). The draft giving guidelines for writers of RTP payload format documents (draft-ietf- avt-rtp-format-guidelines-00.txt) is still awaiting revision. Mark Handley (ISI) committed to update this document in time for the next meeting, for publication as a BCP or Informational RFC. Input is solicited from people who know about the quirks of particular codecs, to effectively capture the knowledge of the group as an aid to the designers of future payload formats. The RTP payload format for PureVoice audio (draft-mckay-qcelp-01.txt) was discussed briefly with a view to conducting last call for proposed standard soon. Some concerns were raised regarding the means for signalling encryption of the media data only (not the RTP headers) by means of a bit in the payload sub-header, since out-of-band signalling, for example using an SDP attribute, may be more appropriate. Clarification of this issue is needed before the draft can go forward, but the document is otherwise complete. The generic RTP payload format was not discussed this time. The authors of the proposals presented at the last meeting were to produce a combined proposal, but this is not yet complete. The authors commited to completing this merger by the next meeting and it is expected that further discussion of the subject will occur then. The RTP MIB (draft-ietf-avt-rtp-mib-02.txt) was presented by Mark Baugher (Intel). The changes since the previous draft are the elimination of separate tables for hosts and monitors, with an rtpSessionMonitor variable being added to distinguish the two modes. The rtpFracLost variable was removed since it was found not to be useful. The definition of a session was clarified. The RTP MIB will be referenced by the H-series MIBs which are being defined, but the RTP MIB itself will remain an AVT work item (since RTP experience is in this group). The H-series MIB developers will review the RTP MIB to ensure that it meets their needs. The main area of concern with this current version of the RTP MIB is the complexity of the compliances definitions resulting from the merger of the tables. It is expected that the MIB will be ready for last call after a further round of revisions, probably by the time of the next meeting. A reference implementation of the MIB is now available. The major topic of discussion on the first day was multiplexing multiple streams into a single RTP stream. This discussion started with an introduction by Stephen Casner (Cisco) who raised a number of issues for consideration in the following discussion: - Why are we multiplexing: because common handling is needed or to reduce overhead? - When should we multiplex: what should be separate and what should be bundled? - Where should we multiplex? At which protocol level? Can we keep all multiplexing at one protocol level? - How should we multiplex: an application specific solution or a general purpose one? A pointer was also given to Jonathan Rosenberg's internet-draft on this subject from December 1996, which discusses these issues. Copies of this may be obtained from http://www.cs.columbia.edu/~jdrosen/aggregate/rtpdoc.htm and a revised and updated version of this will be re-posted. Following the introduction, a number of proposals were presented: Jonathan Rosenberg (Lucent) and Barani Subbiah (Nokia) presented somewhat similar proposals applicable for the efficient connection of PSTN gateways using RTP. Tohru Hoshi (Hitachi) and Mark Handley (ISI) presented more general proposals. The first presentation was of draft-ietf-avt-aggregation-00.txt by Jonathan Rosenberg. This proposal deals with the interconnection of telephony gateways only - it is not a general purpose multiplexing protocol. Some of the main features of this proposal are: - Users are identified by a single 7 bit identifier. The mapping from SSRC to this identifier is done by non-RTP means (eg: SIP/H.323 signalling). If more than 127 streams are being transported between two gateways multiple multiplexed RTP sessions must be used. - All multiplexed streams must share a common clock and generate packets at integer multiples of a common frame duration. They do not need to use a common codec. - Payload type information is transported for each multiplexed user. In many cases the payload type specifies the length of the packet so there is no need for a separate length indication field (although one can optionally be provided if necessary). - User payloads are not word aligned. Aligning them would reduce the bandwidth efficiency significantly. It was noted that statistical multiplexing of multiple streams using silence suppression can cause problems: there is a limit to how many streams can be packed into a single multiplexed packet before exceeding the network MTU. If the limit is exceeded, multiple packets must be generated: this will cause the receiver to see non-contiguous sequence numbers per user giving the appearance of loss. The solution is to limit the statistical multiplexing of streams so that the MTU is not exceeded even when no streams are silent, but this reduces the efficiency somewhat. It was also noted that the loss of a single packet from the multiplexed stream will affect all users multiplexed into the stream. It is not expected that this will be a problem (in fact the use of multiplexing may reduce loss rates, since it reduces the both the data and packet rates compared to non-multiplexed streams). Barani Subbiah (Nokia) presented draft-ietf-avt-mux-rtp-00.txt. This proposal is designed to solve a similar problem to that of Rosenberg and is unsurprisingly somewhat similar to that proposal. The main difference seems to be that this protocol uses an explicit length indication for each multiplexed packet (6 bits rather than 16 bits in Rosenberg's proposal) and that the payload type for each user is signalled out-of- band rather than carried in the payload (disallowing changing payload types on the fly). A more generic proposal (draft-tanigawa-rtp-multiplex-00.txt) was presented by Tohru Hoshi (Hitachi). In this proposal RTP streams are multiplexed, rather than voice streams, by concatenating multiple RTP packets into one. This allows for the multiplexing of any sort of data, rather than just voice data, at the expense of additional overheads. Once again, out of band signalling is required to indicate that this is a multiplexed stream. It was noted that it may be possible to generalise this as a generic UDP multiplexing protocol, rather than an RTP multiplexer. These proposals discuss out-of-band signalling which is required for correct operation of these protocols. It is noted that whilst signalling is required in these cases, and should be simple to implement using either SIP or H.323, it is outside the scope of a payload format document (the payload format should be independent of the signalling protocol). It was noted that the presence of multiple multiplexing solutions is not necessarily desirable, since this hinders interoperability. It would be desirable to combine these proposals into one if at all possible. However, it was further noted that the drafts from Rosenberg and Subbiah are essentially solving the same problem, whilst the proposal presented by Hoshi is doing something different. The tradeoffs are different and we may need two protocols: the issues are sufficiently different for the two scenarios. None of the three proposals presented so far have solved the generic multiplexing problem: the first two are clearly very application specific, the third requires out of band signalling to operate. The second session started with a brief presentation by Mark Handley (ISI) describing an idea which resulted from the earlier discussion of multiplexing. This proposal, MuRGE, uses the techniques of RTP header compression within a single packet as a generic multiplexing method (all state is reinitialised within each packet). That is, each packet contains a standard RTP header followed by a number of payloads each with their own payload header. The payload headers are coded as differences from the previous header. Clearly the bandwidth efficiency of this proposal depends on the similarity between the headers of the multiplexed payloads. If used between cooperating gateways where SSRC values can be allocated consecutively and the codecs, timestamps and sequence numbers are synchronised, this proposal can produce a single byte header for each multiplexed packet. If there is no cooperation between multiplexing points the full RTP header has to be sent for each multiplexed stream. If signalling is employed between multiplexing points (eg: for SSRC mapping) then some gain can be made even in the most generic case. This proposal is at a very early stage of development, but introduces some interesting ideas. Further work is clearly needed. The discussion of multiplexing concluded with agreement that the development of a multiplexing proposal is of interest and should become a work item of the group. The next item for discussion was transport of MPEG4 streams using RTP draft-ietf- avt-rtp-mpeg4-00.txt and the role of DMIF signalling draft-ietf-avt-mpeg4-dmif-01.txt which was presented by Vahe Balabanian (Nortel). After outlining the proposals discussion focused on a number of open issues: - Should MPEG4 elementary streams be transported directly over RTP or should they be encapsulated using FlexMux first? There is some concern that the use of FlexMux does not cleanly fit into the RTP model in particular the interaction with RTP mixers is unclear. - The mapping of MPEG4 scene and object descriptor streams to RTP is unclear. It may be that these need special transport and protocols other than RTP may be better suited to their needs: in particular the initial session description should not be carried in RTP. The transport of dynamic updates to this is an area which needs further study: an RTP stream may be appropriate or alternatively a separate signalling stream (eg: RTSP using ANNOUNCE) may work better. - The mapping of MPEG4 decoder timestamps to RTP is unclear, since RTPincludes only a send timestamp and applications are expected to derive their own decode time based on the observed network timing jitter. The next IETF meeting coincides with the MPEG4 decision meeting, so it is therefore urgent that these issues are resolved. The last chance to make changes to MPEG4 version 1 will be at the MPEG meeting on 12-16 October. Since it is clear that only limited progress can be made in the short time period available it was decided to continue with the development of the payload format for MPEG4 elementary streams and to resolve the issues discussed. Once this is done and further experience has been gained with actual implementations the group will revisit these issues. The major discussion item for the second day was the advancement of RTP and the Audio/Video profile from proposed- to draft-standard status. A summary of the changes in draft-ietf-avt-rtp-new-01.txt was made by Stephen Casner (Cisco). These include: - Added fudge factor in timer reconsideration - Added fix for underestimate when using SSRC sampling if the group size decreases rapidly - RTCP sender and receiver bandwidth may be specified as a parameter (rather than the default 5%) - RTCP minimum interval may scale smaller for high bandwidth sessions and zero initial delay for unicast sessions - Specified padding for RTCP only on the last packet - Specified relative NTP uses the "best" platform clock - Formal reference to IPsec for security (this concerns some people since RTP may be used in scenarios where the presence of IPsec cannot be guaranteed...) - Partial conversion to SHOULD, MUST, MAY, etc In addition it has been decided not to make a number of the changes which have previously been suggested: - Ignore group size dropping to zero with reverse reconsideration. - No scaling of the RTCP interval larger since this could cause time-outs - No changes to the jitter algorithm for multi-packet video frames - No additional SDES items were defined (these can be registered with the IANA separately) - No change to the definition of the RTCP RR loss fraction - Nothing was added about translators adding random timestamp offsets The issue of conditional vs unconditional reconsideration was discussed and it was noted that there is little to choose between these two algorithms in practise, and that unconditional reconsideration is simplest to implement. The next revision will therefore only include unconditional reconsideration. A number of problems which have been discovered in the SSRC sampling algorithm were presented by Jonathan Rosenberg (Lucent). The problems with reconsideration and over-weighting of senders have been corrected in the current RTP draft. A problem remains when the group size decreases rapidly which results in members using SSRC sampling producing very inaccurate estimates of the group size. A solution using a "binning" algorithm is proposed in draft-ietf-avt-rtpsample-00.txt but this algorithm may be patented by Lucent (who are willing to license on "fair, reasonable and non- discriminatory terms"). Much discussion occurred on this topic since it is considered undesirable to have patented technology as part of the main specification. The cleanest solution appears to be to move the SSRC sampling algorithms out of the main specification into a separate document. The main RTP specification would note that SSRC sampling may be desirable in certain cases and point to this new document for implementation advice and sample algorithms. This separation will occur with the next version of the RTP specification. The changes to the Audio/Video profile (draft-ietf-avt-profile-new-03.txt) are less extensive: the PureVoice codec has been added and assigned payload type 12. As discussed at the previous meeting this is the last static assignment which will be made - this policy is now stated explicitly in the draft. The static assignment of payload type 77 to redundant audio has been removed since all known implementations use a dynamic payload type. References to MPEG1 system streams and MPEG2 program streams (RFC2250) using dynamic payload types have been added and a number of other RFC references have been updated. The new policy regarding static payloads needs to be better described in the next revision of the draft. The SDP modifiers to explicitly denote the RTCP fraction (if the default of 5% is not being used) have yet to be written, but it is felt that these should not be added to the A/V profile (since that should not be tied to SDP) rather a new document should define them. The MIME registration of payload formats has yet to be done. There are many open issues here regarding how this should occur, what information is bound to the names, who MIME is extended for types which cannot be represented in email, etc. An update on the RTP payload for redundant audio was presented by Colin Perkins (UCL). This document (draft-ietf-avt-rtp-redundancy-revised-00.txt) updates RFC2198 in the light of additional usage experience. The change is to specify that all packets in a redundant stream should be sent using the redundancy format, rather than sending the first packet(s) in a talkspurt using the payload format of the primary codec. This allows for explicit advertisement of the buffering requirements of a stream which simplifies implementations and removes the need for an out-of-band parameter to convey this information. An update on the RTP payload for generic forward error correction (draft-ietf-avt-fec- 03.txt) was presented by Jonathan Rosenberg (Lucent). Changes since the previous draft include the addition of code fragments illustrating the decoding stage, support for FEC using Reed-Solomon codes, extension of the timestamp recovery to 56 bits and removal of the reference to the expired draft by Budge. A number of issues with the current draft were discussed including the required mask size: 24 bits is believed sufficient so the optional extension to 56 bits will be removed from the draft. It was also decided that parity FEC and Reed-Solomon codes are sufficiently different that this draft should be split into two. The resulting parity FEC payload format document is expected to be ready for last call after one further revision; the Reed- Solomon payload format document will need further work over the coming months. The meeting concluded with a reminder that a revised working group charter has been posted. Comments and discussion of the proposed new charter and milestones should be directed to the mailing list.