- Is RTP a transport
protocol or a kind of application protocol?
- RTP has important properties of a transport protocol: it runs on
end systems, it provides demultiplexing. It differs from transport
protocols like TCP in that it (currently) does not offer any form of
reliability or a protocol-defined flow/congestion control. However, it
provides the necessary hooks for adding reliability, where appropriate,
and flow/congestion control. Some like to refer to this property as
application-level framing (see D. Clark and D. Tennenhouse,
"Architectural considerations for a new generation of protocols",
SIGCOMM'90, Philadelphia). RTP so far has been mostly implemented
within applications, but that has no bearing on its role. TCP is still
a transport protocol even if it is implemented as part of an application
rather than the operating system kernel.
- RTP does not ensure
real-time delivery. So how come it is called a real-time
protocol?
- No end-to-end protocol, including RTP, can ensure in-time delivery.
This always requires the support of lower layers that actually have
control over resources in switches and routers. RTP provides
functionality suited for carrying real-time content, e.g., a timestamp
and control mechanisms for synchronizing different
streams with timing properties.
- Is RTP an unreliable
protocol? Are there any mechanisms provided for error recovery in
RTP?
- As currently defined, RTP does not define any mechanisms for
recovering for packet loss. Such mechanisms are likely to be highly
dependent on the packet content. For example, for audio, it has been suggested to add
low-bit-rate redundancy, offset in time. For other applications,
retransmission of lost packets may be appropriate. (The H.261 RTP
payload definition offers such a mechanism.) This requires no additions
to RTP. RTP probably has the necessary header information (like
sequence numbers) for some forms of error recovery by retransmission.
- Can RTP run over IPv6?
ATM?
- Yes. RTP contains no specific assumptions about the capabilities of
the lower layers, except that they provide framing. It contains no
network-layer addresses, so that RTP is not affected by addressing
changes. Any additional lower-layer capabilities such as security or
quality-of-service guarantees can obviously be used by applications
employing RTP. There are several implementations of video tools that
run RTP directly over AAL5 (T. Braun) and recent efforts to define the
carriage of RTP over AAL2 and AAL5. It should be
noted that the RTCP CNAME field is currently based on the assumption
that hosts have Internet-style domain names.
- Can RTP be used in
asymmetric networks?
- In asymmetric networks, the bandwidth in one direction, typically
from the user to the Internet, is significantly lower than in the other.
These networks include ADSL, cable modems and
satellite distribution. RTP can be used readily, but it may be
necessary to have only data senders send RTCP messages. These RTCP
messages are useful to allow inter-media synchronization and identify
the content of the media stream.
- Why doesn't RTP have a
length field?
- RTP does not contain a length field, that is, it assumes that framing
is performed by the underlying protocol and that only one RTP packet
is to be carried in one PDU of the underlying protocol. This is the
typical application with UDP (or AAL5) as the underlying protocol.
Since most applications currently envisioned do not need framing, it
would be a waste of processing and bandwidth to add one. This is
covered in detail in the section RTP over Network and Transport
Protocols of the spec.
- If RTP is used with a protocol that is not message-based (e.g., TCP)
or if it is desirable to carry several RTP packets in one lower-layer
PDU (e.g., for aggregation of streams), it is trivial to define a
profile that prefixes the RTP header by a 16 or 32-bit length field,
depending on the desired tradeoff between overhead and maintaining word
alignment.
- Does RTP
have a fixed packetization interval?
- Some implementations assume that packet audio is sent with a
particular packetization interval, e.g., 20 ms. This is wrong.
While RFC 1890 recommends certain values and SDP allows to express a
preference, implementations need to be able to handle all reasonable
values. There is no constraint that G.711 or other sample-based formats
is conveyed in multiples of a certain unit. Thus, an RTP packet with 123
samples of G.711 is perfectly legitimate and needs to be handled
appropriately.
- Are all these fields
really needed?
- Periodically, it is suggested to create an "RTP lite" version with a
header shorter than 12 bytes. It is argued that, in particular, packet
voice does not require all the RTP header fields and is particularly
sensitive to packet header overhead due to the short payloads.
In general, the best compression is accomplished using RTP header
compression, as it can compress the IP/UDP/RTP headers from 40 to one or
two bytes. However, it only works for short-delay unicast connections on
a single link.
For wide-area links that see a lot of voice trafic, e.g., for PBX
interconnect, RTP muxing is far more efficient, since it avoids the
overhead of IP and UDP packet headers, as well as featuring shorter RTP
headers. Using RTP muxing, the overhead can be reduced to about two
bytes per "channel", with one UDP/IP header for up to several dozen
channels.
A minimal version of RTP would likely contain a sequence number (SN)
and a payload type (PT), with a minimum combined size of two bytes.
Unfortunately, such a choice would have a number of disadvantages:
- Not suitable for mixers and translators, due to the absense of SSRC.
- The total reduction in overhead is modest: A G.723.1 packet with an
audio payload of 20 bytes would shrink from a total of 60 bytes to 50
bytes, or 20%. This covers Internet traffic growth of at most two
months...
- Since SSRC is needed for multicast, this header would break
compatibility even within the H.323 suite (namely, H.332).
- H.245 and SDP would have to be extended to handle the negotiation of
the additional RTP header format.
- Unless every device supports both the full-length and short version,
gateways are needed to translated between the two.
- Without a time-stamp, cross-media synchronization becomes very
difficult unless audio without silence suppression is used. (Silence
suppression is a far more effective mechanism of saving bandwidth than
header compression, with a typical bandwidth reduction of close to 50%.)
- Much of the RTCP functionality would have to be revisited, since it
relies on the presence of timestamps and longer sequence numbers for
jitter computation, loss statistics and synchronization.
- How does padding
work?
- Since the underlying transport unit defines the end of the packet,
the application can always locate the last byte of the (say, UDP) packet
and look there for the number of padding bytes.
- Practically speaking, how is the timestamp
computed?
- For audio, the timestamp is incremented by the packetization interval
times the sampling rate. For example, for audio packets containing 20
ms of audio sampled at 8,000 Hz, the timestamp for each block of audio
increases by 160, even if the block is not sent due to silence
suppression. Also, note that the actual sampling rate will differ
slightly from this nominal rate, but the sender typically has no
reliable way to measure this divergence.
- For video, time clock rate is fixed at 90 kHz. The timestamps
generated depend on whether the application can determine the frame
number or not. If it can or it can be sure that it is transmitting
every frame with a fixed frame rate, the timestamp is governed by the
nominal frame rate. Thus, for a 30 f/s video, timestamps would increase
by 3,000 for each frame, for a 25 f/s video by 3,600 for each frame. If
a frame is transmitted as several RTP packets, these packets would all
bear the same timestamp. If the frame number cannot be determined or if
frames are sampled aperiodically, as is typically the case for software
codecs, the timestamp has to be computed from the system clock (e.g.,
gettimeofday()).
- In a
multimedia conference, are the initial timestamp values
related?
- No, initial time stamp values are picked randomly and independently
for each RTP stream. (This is more or less unavoidable if different
media types are generated by independent applications, whether these
applications reside on the same host or not.) Synchronization (such as lip sync) between different
media is performed by receivers through the NTP timestamps in the RTCP
sender reports. This timestamp provides a common time reference that
associates a media-specific RTP timestamp with the common "wallclock"
time shared across media. The mechanism how end systems synchronize
different media is not prescribed by RTP, however, a workable approach
is to periodically exchange messages between applications to indicate
what delay each application would impose on the stream (including any
media decoding delays) if it were not to synchronize and then have all
applications choose the maximum of these delays.
- What are the
roles of the RTP timestamp and sequence numbers?
- The timestamp is used to place the incoming audio and video packets
in the correct timing order (playout delay compensation). The sequence
number is mainly used to detect losses. Sequence numbers increase by
one for each RTP packet transmitted, timestamps increase by the time
"covered" by a packet. For video formats where a video frame is split
across several RTP packets, several packets may have the same timestamp.
In some cases such as carrying DTMF (touch tone) data (RFC 2833), RTP
timestamps may not be monotonic.
- What are the
different clocks and how are they synchronized?
- RFC 3550 specifies one media-timestamp in the RTP data header and a
mapping between such timestamp and a globally synchronized clock,
carried as RTCP timestamp mappings.
The NTP timestamps in the SR are assumed to be synchronized between
all media senders within a single session. If the media sources come
from the same network source, this is obviously not an issue.
Receiver(s) synchronize to the sender, the only solution feasible for
multicast.
Experience has shown that all other cross-media, cross-host schemes
end up doing clock synchronization, usually inferior to NTP and
application-specific.
- What's the marker bit
good for?
- For voice packets, the marker bits indicates the beginning of a
talkspurt. Beginning of talkspurts are good opportunities to adjust the
playout delay at the receiver to compensate for differences between the
sender and receiver clock rates as well as changes in the network delay
jitter. Packets during a talkspurt need to played out continuously,
while listeners generally are not sensitive to slight variations in the
durations of a pause.
The marker bit is a hint; the beginning of a talkspurt can also be
computed by comparing the difference in timestamps and sequence numbers
between two packets, assuming the timestamp clock rate is known.
Packets may arrive out of order, so that the packet with the marker
bit is received after the second packet in the talkspurt. As long as the
playout delay is longer than this reordering, the receiver can still
perform delay adaptation. If not, it simply has to wait for the next
talkspurt.
- What is the
sender packet count and byte count used for?
- They are not needed for loss computation; the sequence number fields
are used for that to avoid round-off errors. They may be used to
compute the sender packet and byte rate.
- What is the RTP
timestamp in the RTCP sender report used for?
- The RTP timestamp and NTP timestamps form a pair that identify the
absolute time of a particular sample in the stream. For example, if the
RTCP sender report contains an RTP timestamp of 1234 and an NTP
timestamp indicating February 3, 10:14:15, it means that sample 1234 in
the media stream occured exactly on February 3, 10:14:15.
- How is the jitter
computed?
- If several packets, say, within a video frame, bear the same timestamp,
it is advisable to only use the first packet in a frame to compute the
jitter. (This issue may be addressed in a future version of the
specification.)
Jitter is computed in timestamp units. For example, for an audio
stream sampled at 8,000 Hz, the arrival time measured with the local
clock is converted by multiplying the seconds by 8,000.
Steve Casner wrote:
For encodings such as MPEG that transmit data in a different order than
it was sampled, this adds noise into the jitter calculation. I have
heard handwavy arguments that this factor can be calculated out given
that you know the shape of the noise, but my math isn't strong enough
for that.
In many of the cases that we care about, the jitter introduced by
MPEG will be small enough that when the network jitter is of the same
order we don't have a problem anyway.
There is another problem for video in that all of the packets of a
frame have the same timestamp because the whole frame is sampled at
once. However, the dispersion in time of those packets really is all
part of the network transfer process that the receiver must accommodate
with its buffer.
It has been suggested that jitter be calculated only on the first
packet of a video frame, or only on "I" frames for MPEG. However, that
may color the results also because those packets may see transit delays
different than the following packets see.
The main point to remember is that the primary function of the RTP
timestamp is to represent the inherent notion of real time associated
with the media. It also turns out to be useful for the jitter measure,
but that is a secondary function.
The jitter value is not expected to be useful as an absolute value.
It is more useful as a means of comparing the reception quality at two
receiver or comparing the reception quality 5 minutes ago to now.
- What is the
session bandwidth?
- First, it is most certainly not the link bandwidth. This
would not scale, as then a large number of sessions could saturate the
link with RTCP traffic, even if each used just 5% of the link bandwidth
for RTCP. Secondly, the concept of link bandwidth is ill-defined in a
heterogeneous network.
The session bandwidth is the nominal data bandwidth plus the IP, UDP
and RTP headers (40 bytes). For example, for 64 kb/s PCM audio
packetized in 20 ms increments, the session bandwidth would be (160 +
40) / 0.02 bytes/second or 80 kb/s. If there are multiple senders,
the sum of their individual bandwidths is used.
The session bandwidth is typically defined out-of-band, e.g., in a
session announcement protocol, based on reasonable estimates of the
number of concurrent senders and their average bandwidth. Distributed
and consistent on-line estimation of the session bandwidth may be hard
as the number of senders and their bandwidth changes. The absolute
value is less important than that all participants agree on a common
value. (After all, there is nothing special about choosing the RTCP
bandwidth to be 5% of the session bandwidth, it just has to be agreed
upon by all participants to avoid timing out members prematurely.)
- What is the use of
RTCP for two-party calls?
- Since the cost of sending RTCP is minimal (about one packet every 5
seconds), it makes sense to send RTCP even for point-to-point
connections:
- With RTCP, both sides know how well the other side is receiving
audio and video; this is useful, since degraded quality can have any
number of reasons beyond network loss, delay and jitter. A particular
use is when calling technical support: the tech support person can
observe the network performance at the remote end.
- RTCP is necessary for synchronizing audio and video streams.
- For audio with silence suppression, RTCP is useful as a liveness
indication.
- The SDES information is useful for user interfaces.
- Many applications will (want to) support both unicast and multicast,
so that the additional implementation complexity is zero.
- How do I register
an RTP payload type?
- See the description, drawn from RFC
1890 (with some practical comments).
- What is the
current list of RTP payload types?
- See the current version of the RTP profile or the list maintained by
IANA at http://www.iana.org/assignments/rtp-parameters.
- What are dynamic
payload types?
- Dynamic payload types are described in the RTP A/V Profile. Unlike
static payload types, dynamic payload types are not assigned in the RTP
A/V Profile or by IANA. They map an RTP payload type to an audio and
video encoding for the duration of a session. Different members of a
session could, but typically do not, use different mappings. Dynamic
payload types use the range 96 to 127. They are assigned by means
outside of the RTP profile or protocol specification, including
Note that a number of encodings are described in the RTP A/V profile
which do not have a static (permanent) payload type. The RTP A/V
Profile defines names for encodings which may be used by SDP or
other mechanisms to specify the mapping. Encodings may also be
identified by object identifiers or other names.
Since the space for payload types is limited, only very common
encodings should be assigned static types. These are typically audio
and video encodings "blessed" by international standardization bodies,
such as the G. series of ITU-T audio encodings. The RTP A/V Profile
defines a set of criteria for making static assignments.
- If I'm using H.323 or
other set-up protocol, can I ignore the RTP payload type (PT)
field?
- An application must never just play a packet without inspecting its
payload type, even if a single payload type has been negotiated via
H.245 or similar protocols. New mechanisms, including
- transmission of DTMF digits (RFC 2833),
- comfort noise indication,
- forward error correction using redundant data,
- switching of encodings to take into account network conditions
may conveniently use the PT to indicate special packets, which an end
application can ignore, if desired, ensuring backward compatibility.
But this assumption is violated if an application blindly plays back all
packets regardless of PT.
Also, in multicast environments, it is unlikely that every sender will
use the same payload type.
- Should the RTP payload
type (PT) field be used for multiplexing different
streams?
- It has been suggested that in some environments (such as RTP over
AAL5) that lack lower-layer muxing abilities, the RTP payload type (PT)
field be used to differentiate streams originating from different
sources. This is a fundamentally bad idea and violates the letter and
intent of the specification. It makes use of multiple PTs in a single
stream difficult (see previous question). It is also unnecessary, as
the SSRC was designed for distinguishing several sources.
- Should the RTP SSRC
be used for demultiplexing different streams for the same RTP session?
- The RTP SSRC is meant to label streams from different
sources, that is, each sender in a conference has its own SSRC. It has
been suggested to have a single source, using the same RTP session
(identified by source and destination addresses and ports), send
different media, such as an audio and video stream, using different
SSRCs. This is generally a bad idea for the following reasons:
- An RTP mixer normally combines all the SSRCs it receives on an RTP
session according to the composition method that is appropriate for that
session (e.g., mixing for audio). If multiple media are sent on one
session, then the SSRCs must be segregated per medium based on external
information. That gets complicated with sources coming from multiple
places. It is similarly more complicated for and end node receiver to
handle streams coming from multiple sources to the same RTP session if
some of those sources don't all get fed to the same compositor (mixer,
selector, whatever).
- Carrying multiple media in one RTP session precludes the use of
different network paths or network resource allocations if appropriate.
For the typical synchronized audio/video stream one may not want
different paths, but it is not hard to imagine situations where one
medium should go via a low-bandwidth, low-delay terrestrial path while
another can tolerate the longer delay of a satellite path in order to
get higher bandwidth.
- Carrying multiple media in one RTP session precludes reception of a
subset of the media if desired, for example just audio if video would
exceed the available bandwidth. This is not an issue for unicast since
that choice of media would be controlled by the exchange with the
sender, but it is valuable for multicast with heterogeneous receivers.
- Carrying multiple media in one RTP session precludes receiver
implementations that use separate processes for the different media,
whereas using separate RTP sessions permits either single- or
multiple-process implementations. Consider the development of "desk
area networks" at MIT, ISI and other places in which the display and the
speaker may have different IP addresses. This is an instance of the
general philosophy of demultiplexing at the lowest level possible.
- Also, making the SSRC fixed is a problem in the multicast case
because collision resolution might require changing the SSRC id.
(contributed by Steve
Casner)
- Do receivers
need their own SSRC identifiers?
- Yes, all participants in an RTP session have SSRC values, since they
are needed in receiver reports.
- Why can't we just use TCP
for audio and video?
- For delivering audio and video for playback, TCP may be appropriate.
Also, with sufficiently long buffering and adequate average throughput,
near-real-time delivery using TCP can be successful, as practiced by the
Netscape WWW browser. TCP may often run over highly lossy networks
(e.g., the German X.25 network) with acceptable throughput, even though
the uncompensated losses would make audio or video communication
impossible.
However, for real-time delivery of audio and video, TCP
and other reliable transport protocols such as XTP are inappropriate.
The three main reasons are:
- Reliable transmission is inappropriate for delay-sensitive data
such as real-time audio and video. By the time the sender has
discovered the missing packet and retransmitted it, at least one
round-trip time, likely more, has elapsed. The receiver either has to
wait for the retransmission, increasing delay and incurring an audible
gap in playout, or discard the retransmitted packet, defeating the TCP
mechanism. Standard TCP implementations force the receiver
application to wait, so that packet losses would always yield
increased delay. Note that a single packet lost repeatedly could
drastically increase delay, which would persist at least until the end
of talkspurt.
- TCP cannot support multicast.
- The TCP congestion control mechanisms decreases the congestion
window when packet losses are detected ("slow start"). Audio and
video, on the other hand, have "natural" rates that cannot be suddenly
decreased without starving the receiver. For example, standard PCM
audio requires 64 kb/s, plus any header overhead, and cannot be
delivered in less than that. Video could be more easily throttled
simply by slowing the acquisition of frames at the sender when the
transmitter's send buffer is full, with the corresponding delay. The
correct congestion response for these media is to change the
audio/video encoding, video frame rate, or video image size at the
transmitter, based, for example, on feedback received through RTCP
receiver report packets.
An additional small disadvantage is that the TCP and XTP headers are
larger than a UDP header (40 bytes for TCP and XTP 3.6, 32 bytes for XTP
4.0, compared to 8 bytes). Also, these reliable transport protocols do
not contain the necessary timestamp and encoding information needed by
the receiving application, so that they cannot replace RTP. (They would
not need the sequence number as these protocols assure that no losses or
reordering takes place.)
While LANs often have sufficient bandwidth
and low enough losses not to trigger these problems, TCP does not offer
any advantages in that scenario either, except for the recovery from
rare packet losses. Even in a LAN with no losses, the TCP slow start
mechanism would limit the initial rate of the source for the first few
round-trip times.
- Can't we just use
XTP?
- Many of the arguments parallel those in the previous section. The
question of the relationship of RTP and XTP appears to arise
frequently. (This may simply be due to the word 'transport' in both
protocol names.) However, XTP and RTP are not replacements for each
other. XTP is designed as a general, configurable network and transport
protocol for both reliable and unreliable data communications. RTP has
no reliability mechanisms (although these could be added if desired for
specific applications) and no flow control like the rate control in XTP.
RTP is not intended for regular, reliable data transfer (where TCP or
XTP might be used instead). For real-time data, where retransmission is
usually not possible due to timing constraints, XTP would have to
disable retransmission. Flow/congestion control for real-time data is
most likely inappropriate as the rate of such sources is inherently
given and not modifiable on the time-scale of transport-protocol flow
control, as explained in the previous section. It should be noted that
RTP supports mechanisms that allow a form of congestion control on
longer time scales, e.g., by modifying the source encoder if network
congestion is detected.
RTP has no protocol state by itself and can
thus be used over either connection-less networks, such as IP/UDP, or
connection-oriented networks, such as XTP, ST-II or ATM (AAL3/4 or
AAL5). Many real-time multimedia applications use multicast with a
large fan-out, e.g., several hundred to thousands for a lecture or
concert. Connection-oriented protocols like XTP have difficulty scaling
to such a large number of receivers.
XTP does not offer timing or
content type (media) information, and thus would need these services, as
offered by RTP. XTP provides no RTP-like direct feedback of the
received quality-of-service, and thus, again, would have to "import"
these from another protocol. Looking at existing applications using XTP
for real-time services confirms that they need to add a layer similar in
content to the RTP data part "between" XTP and the actual media.
- How should RTP
sessions be played back?
- Since RTCP packets contain absolute time information, a recorded
session cannot simply be played back by time-shifting the whole recorded
session. One approach plays back the data packets with their original
time stamps, with re-normalized timing. SDES information other than
NOTE items can be gathered for each source and regenerated as in a
"live" session. NOTE SDES items need to be inserted at the appropriate
instant in the playback as they are allowed to change.
- Is there an RTP
library or kernel implementation?
- RTP (in particular, the data part) is tightly coupled to the
application, so that a kernel implementation makes little sense. A
number of people have developed libraries that implement RTP and RTCP
(see listing). The sources to NeVoT, rtpdump,
vat, rat and vic also contain RTP and RTCP processing modules which
should be usable in other applications with minor modifications. Note
also that the specification itself contains numerous code fragments.
(Most of the other applications are using older versions of RTP and thus
should not be relied upon for developments.)
The Java
Media Framework (JMF), a Java API, also supports RTP and RTCP.
There is no standard API for RTP.
- What are some of the
differences between the VAT protocol and RTP?
- The VAT protocol was originally implemented in the VAT audio tool
and subsequently also in other audio tools such as NeVoT. The VAT
protocol is now obsolete and should not be used or implemented.
The VAT header format is only described in header files. (See the
VAT and NeVoT sources for details.) Many aspects of RTP and the VAT
protocol are similar, but RTP improves upon the VAT protocol in a number
of ways:
- The VAT protocol was designed for audio only, while RTP is
specified for audio and video and may be suitable for other real-time
applications.
- RTP is designed to be protocol-independent and can be used with
non-IP protocols (ATM AAL5, for example) as well as, say, IPv6.
- RTP source identification simplifies the use of mixers and
translators.
- RTP has a number of features that simplify use of application-level
encryption (padding, etc.).
- The RTP header is extensible, should the need arise in the
future.
- The RTP header has a sequence number which simplifies accurate
loss detection and measurement and the handling of images transmitted
in several packets.
- The RTCP SDES packets contain additional information that
simplify tracing of misbehaving sources, e.g., their email address or
telephone number.
- The RTCP SDES CNAME items simplify the construction of
multimedia application from independent media agents.
- RTCP sender and receiver reports allow the implementation of
adaptive applications, that is, applications where senders scale their
bandwidth consumption based on network load.
- RTCP sender and receiver reports allow monitoring of the quality
of service within, say, a multimedia conference.
- What are the
differences between RTP version 1 and 2?
- Version 1 is of historical interest only. Applications should not
be written for it. RTP version 2 is not backwards compatible with
version 1. If you care, you can find a definition of version 1 in an
old Internet draft.
- Are there specific
ports assigned to RTP?
- No, as explained in the section Port Assignment of the RTP
profile:
As specified in the RTP protocol definition, RTP data is to be
carried on an even UDP port number and the corresponding RTCP packets
are to be carried on the next higher (odd) port number.
Applications operating under this profile may use any such UDP port
pair. For example, the port pair may be allocated randomly by a
session management program. A single fixed port number pair cannot be
required because multiple applications using this profile are likely
to run on the same host, and there are some operating systems that do
not allow multiple processes to use the same UDP port with different
multicast addresses.
However, port numbers 5004 and 5005 have been registered for use with
this profile for those applications that choose to use them as the
default pair. Applications that operate under multiple profiles may use
this port pair as an indication to select this profile if they are not
subject to the constraint of the previous paragraph. Applications need
not have a default and may require that the port pair be explicitly
specified. The particular port numbers were chosen to lie in the range
above 5000 to accommodate port number allocation practice within the Unix
operating system, where port numbers below 1024 can only be used by
privileged processes and port numbers between 1024 and 5000 are
automatically assigned by the operating system.
Also, the multicast (version 3.5 and later) kernel sources use the
following port ranges:
from | to | application | priority
|
---|
0 | 16383 | unclassified | lowest
|
16384 | 32767 | audio | highest
|
32768 | 49151 | whiteboard | medium
|
49152 | 65535 | video | low
|
Note: The port ranges in question do not make any difference unless
the traffic traverses an interface or tunnel where the multicast traffic
rate exceeds the configured mrouted rate-limiter.
If RTP is used within the H.323 framework, port assignment is done by
the H.225.0 signaling messages. In SDP and SIP, the conference
controller or inviting party picks the port numbers.
- How are
ports assigned for bidirectional unicast RTP sessions?
- Each side in a bidirectional RTP session assigns their
source ports independently, i.e., there is no assumption that if
Alice sends audio to Bob on port 5000 (and RTCP on 5001), Alice also has
to receive audio on port 5000. (Imposing such a restriction on ports
would make it difficult for a host to participate in several independent
RTP sessions using different tools.) Each side in a unicast session
simply indicates to the other side where it wants to receive RTP
packets, e.g., using SDP.
Note that the SSRC values used for each source are always
different.
- What about
firewalls?
Ports used:
H.323 | TCP | 1720
|
H.235 | TCP | ephemeral, > 1024
|
- What is the quality
of audio codec X?
- See separate summary with audio samples.
- Are all audio codecs
patented?
- Most older, higher-bitrate codecs are not subject to patent
protection. However, G.723, G.729.1 and GSM are covered by various
patents. For example, U.S. patent 4,752,956, Digital speech coder
with baseband residual coding modifies coding using short term fine
structure speech data produced by analysers within
encoder-multiplexers applies to GSM and is assigned to Philips.
- Is there a way for a
router to tell apart RTP packets?
- No, if the router doesn't have access to the protocol that
established a session, there is no way by looking at a single packet to
tell that it is an RTP packet. However, if the router maintains state,
it can inspect the sequence number and, with probability, determine that
a particular UDP port pair carries RTP if the sequence number increases
by one (or a small number) for each packet. In addition, the first two
bits of every packet will be the same, namely the RTP version
identifier. It is also likely that the payload type will stay constant
from packet to packet.
- Are there related ITU
efforts?
- Audio and video media formats and encodings:
Name
| Type
| Algorithm
| Sampling frequency (kHz)
| Bit rate
|
MPEG L3
| audio
| 22.05, 44.1
| 48..128
|
G.711
| audio
| mu-law, A-law
| 8.0
| 64 kb/s
|
G.721 subsumed by G.726
| audio
| ADPCM
| 8.0
| 32 kb/s
|
G.722
| audio
|
| 16.0 (7 kHz spectrum)
| 64 kb/s
|
G.723 recommendation no longer in force!
| audio
|
| 8.0
| 24 kb/s
|
G.723.1
| audio
| ACELP and MQ-CLP
| 8.0
| 5.3 and 6.3 kb/s
|
G.726
| audio
| ADPCM
| 8.0
| 16, 24, 32, 40 kb/s
|
G.728
| audio
| low-delay CELP
| 8.0
| 16 kb/s
|
G.729
| audio
| CS-ACELP
| 8.0
| 8 kb/s
|
H.261
| video
| DCT
|
|
|
H.263
| video
| DCT (improved version of H.261)
|
|
|
- H.324:
- Audio and video over POTS at less than 20 kb/s.
For conferencing over ISDN:
- H.221:
- Frame structure for a 64 to 1920 kbit/s channel in
audiovisual teleservices.
- H.320:
- Framework for transmitting audio and video over
circuit-switched digital networks (primarily ISDN).
- H.323:
- H.320 over LAN.
For conference control, application and data sharing, there are a
number of recommendations:
- T.120:
- Introduction to the audiographics and audiovisual
conferencing recommendations.
- T.121:
- Generic application template.
- T.122:
- Multipoint communication service for audiographics and
audiovisual conferencing service definition
- T.123:
- Protocol stack for audiographics and audiovisual
teleconference applications.
- T.124:
- Generic conference control.
- T.125:
- Multipoint communication service protocol specification.
- T.126:
- Still image protocol specification.
- T.127:
- Multipoint binary file transfer protocol.
- mbus
- A protocol for coordinating multimedia applications.
The comp.speech
FAQ contains many additional references, including a good summary.
of how different algorithms work.
- Are there other efforts in using the Internet for real-time audio
and video?
Too many, some may say. vat versions 3.4 and
earlier, one of the early (recent) Internet audio applications, uses
mostly the same audio encodings as specified in the RTP profile, but a
different protocol. There are also a number of Internet telephony applications that usually only
operate on PCs and in unicast mode. There are initial efforts to interconnect the public switched telephone network
and the Internet.
CuSeeMe (for Windows
PC and the Macintosh) is a combined audio and video tool using
reflectors rather than IP-level multicast.
The Internet Telephony
Consortium maintains a listing of
standards and related efforts.
Last updated
by Henning Schulzrinne