H.323

Comments about H.323 and SIP

Jon Crowcroft wrote on Wed, 22 Jan 1998 in the confctrl mailing list:

Having read over the H.323 family of standards documents (thanks to Picturetel for online copies) at some length again, I have to agree.

It seems that H.323 is aimed (along with all the other H series) at extending the ISDN videoconferencing standards on to unreliable packet LANs, but it is

not aimed at solving the scaling problems of WAN Internet
has a large amount of baggage associated with interworking with many other standards - particularly selection and control of multiplexing;
is for conferencing not telephony;
doesn't appear to have a clean call control model.

It seems that session advertisement, invitation, address allocation, multiplexing (such as it is in RTP), RTT estimation, and many other functions are already addressed, both for telephony and for conferencing in the "pure" Internet 'standards' work.

A core piece we might learn from is the conference control functionality in H.245 part of the standards IFF it were expressed in a way that would tell us how to actually implement it in a scalable way, but it isn't.
[The main value we gain from ITU work at the moment I see as coding - basically link level framing and application level media coding and compression standards.]
So I am left with the feeling that we should proceed with "our own" and then specify interworking - why start with something that is primarily targetted at talking to gateways to ISDN? Why not start with a core WAN Internet conference/call control protocol, and then see what a parsimonious design for a gateway would really look like?
See the SIP page, where there's a link to H.323 vs. SIP discussion, for some excellent dissection of the packet exchanges - it seems clear to me that the Q.931/Q.932 stubs, the RAS and other phases are completely unnecessary in most conferences, and that we can specify some equivalent functions (actually, the Q.931/2 are partly implicit in any reliable conf control transport protocol, and for media streams in RSVP or other signaling), many of the RAS functions could be done by simple use of SAP/SIP, and the rest is dealt with mostly by RTP/RTCP - if this works, then the gateway can map these (largely implicit) functions to H.323 (or H.320/H.321/H.322) as needed.

Additional notes from Jon Crowcroft:

H323, H225, H245

H.323: Visual Telephone Systems and Equipment for Local Area Networks which provide a Non-Guaranteed Quality of Service

H.323 is overall structure of ITU System for conferencing terminals - includes Terminals, Gateways (to non-packet nets or to QoS guaranteeing packet nets), Multipoint Controllers, Multipoint Processors and Multipoint Control Units.
It rests on a family of other protocols which do the actual work - i.e. it's a framework. Interoperability with H.324, H.322 (qos lan), H.320 (ISDN) and H.321 (B-ISDN) is via the gateways.
Picture 1 from H.323
Multipoint Controllers serve functions that are based on IGMP and other group management functions up to the H.232 application level - Multipoint processors serve functions of mixing, multiplexing and basically getting between unicast sources and multicast delivery. Multipoint Control Units may incorporate some or all these functions, as well as some conference control functions which are also present in all H.323 terminals too and gateways.
Picture 2 from H.323
H.323 terminals would typically be TCP/IP hosts (PCs) with RTP/UDP stacks to carry H.261 (or H.263 or other coded video) and G.711 (or 722, 723, 728, 729) coded audio - in the ITU view, the T.120 stack is used for conferencing "data" applications. (see later).
Picture 4 from H323
To carry out tasks of assigning control and data flows to the right port/address (TSAP in ISO/ITU parlance), the H.225 protocol is used.
H.225: Media Stream Packetizaion and Synchronization on Non-guaranteed Quality of Service LANs

On ISDN, H.221 (or similar) is used to multiplex audio and video (and data) onto a virtual circuit. In a packet LAN, we may want separate recovery mechanisms and different levels of reliablity for data and video/audio stream (and conferece control) so H.221, with its rigid, almst TDM-like bit level multipkexing is inappropriate. Instead, H.225 is provided. It makes use of underlying transport as much as possible - i.e. again, like H.323, of which it is part, it's mainly a framework. It makes use of RTP/UDP (and IGMP/IP multicast) as well as TCP. However, an important pat of H.225 is RAS - Registration, Admissions and Services - this serves some of the functions of DNS/NIS/Portmapper [Expand] and some of the functions of SAP/SDP/SIP.
RAS messages are used to tell gatekeepers about H.323 terminals. RAS interacts, if needed, with Q.931 signaling protocols to setup calls. Once a call os dpme, a terminal will have a TCP connection to then proceed with H.245 messages to carry out next level up functions. For media, H.225 selects appropriate TSAP Ids (i.e., UDP ports and mcast addresses) to use.
So H.225 uses Q.931 first, i.e., call establishment and clearing via {Alerting, proceeding, connect, connect acknowledge, progress, setup, setup acknowledge}, and {Disconnect, Release, Release Complete} messages. Q.932 can be used to get more IN like facilities - e.g. {Hold, Hold Acknowledge, Hold Reject, Retrieve, Retrieve Acknowledge, Retrieve Reject}. On a packet LAN, clearly Q.931 and so on, are not carried on a separate signaling channel (e.g., on N-ISDN or B-ISDN, there is a pre-agreed circuit for signaling messages- the D channel in narrow band isdn gives a free 14 kbps or so for this). On a packet LAN, messages must go on the LAN, and must be reliable, so TCP is used to a well known port. H.225 defines what the fieelds in the Q.931 messages (in the TCP data packets) carry - e.g. called and calling party addresses - obviously, again, on the LAN, if an H.323 terminal is being called (and not a H.321 ISDN terminal the othe side of a gateway for example) then there is no called address in the sense of "p
User-to-user data in the Q.931 messages (in the TCP data messages on IP on the LAN) can carry lots of infomation - e.g. arbitrary "key pad" data (can use a phone style interface).
More importantly, the user-to-user field is used to carry complex messages encoded In ASN.1 [expand on ASN.1] to carry higher level (H.323, H.245) information - e.g. ...
Setup carries protocol id H.245 transport address source address and information activeMC (is end point under control from an MC - see above and later) conference ID conference goal (create, join, invite, etc) call type (point-to-point, mcast, bandwidth!)
Further messages that are conveyed according to the H.225 specification, include RAS meesages, again, specified in ASN.1 (encoded in BER), and carried in user-to-user part of Q.931 messages over the TCP connection that was setup for this.
Picture - Henning's overall packet exchange picture for H.323
RAS messsage types include:

gatekeeper request
confirm
reject registration request
confirm reject
unregistration request
confirm reject
admission request, confirm, reject
bandwidth disengage
location info
...

As can be seen, then, these are mainly concerned with support for connecting with gateways that provide interworking, between packet LAN and conferencing systems the other side of the gateway.
Once all this is done we can carry out some conferenceing - this requires video and audio, and conference control. The latter in H.323 is the job of H.245
H.245

H.245 uses other protocols too (e.g., H.235 for security specifications). It is used to select between multiplexing layers (H.220, 222, 223 and 225), and to provide transport procedures - it provides analogous (but for a/v, not data) services to T.120. [more on T.120?]
So, because of the possible misunderstandings amongst an arbitrary set of peer s on a packet net, H.245 provides master/slacve determiantion. It then provides capability exchange - i.e. what is each system able to send/receive (not just in qualitative terms, but also absolute and relative quantitative ones (e.g. "if i can receive 3 video of such and such a resolution, i can only mix 2 audio of such a coding).
[in mbone toolset, this could be done on the LAN with the confbus or it could be done with client SAP advertisements....or capabilities in svrloc or even partly DHCP....
H.245 provides logical channel selection (i..e pick a port).
Then it provides RTT estimation, channel maintenance (why, doesn't TCP or RM do that?), and then a set of comands and indications - these are really (yes, really) core conference control facilities: (media control facilities and so on....as well as)
audio/video modes (activity on/off messages, silence supression on/off commands etc) - all better done via RTP :-)
video combine/mix modes
h.243 password and other access control i/o
Chair token control
terminal control messages
conference id info
certificates
make terminal broadcaster, send-this-source, request all iuds, Remote MC control

Tim Dorcey wrote on Wed, 21 Jan 1998 on the confctrl mailing list:

I agree with the criticism of H.323. There is little reason to expect a direct derivative of a standard that was introduced in 1990 to standardize the design of special purpose hardware for videoconferencing over circuit-switched telephone networks to be a good guide to the 1998 design of software that runs on general purpose desktop computers connected by a best-effort packet-switched data network. The contexts are fundamentally different. With H.323 you get all of the disadvantages of a lossy packet switched network, with none of the advantages (except low cost).

Christian Huitema wrote on Jan 22, 1998 on the confctrl mailing list:

Jon is basically on target there. The main problems with H.323 are:

That it is designed for interworking with H.320, i.e. videoconference with ISDN, which mandates a two phase conection set-up: first build the equivalent of an ISDN circuit, then negotiate how to mux channels on that circuit.
That it is top heavy, in fact a monument of complexity.

The two phase set-up has very weird consequences when you try to interconnect an H.323 terminal with a plain telephone through an Internet telephony gateway. Basically, at the end of the first phase (the H.225/Q.931 layer of H.323), the phone call is set and the phone user starts speaking and listening. Yet, the H.323 user will not listen and cannot speak until the H.245 negotiation has been completed -- on the Internet, this may take a second or two.
Implementors of H.323 are trying to solve that problem by proposing kludges. One can for example start the negotiation of H.245 as soon as the phone "starts ringing", without actually waiting for completion of the Q.931 procedure. This works if one solves the resulting synchronisation problem, and cuts delays significantly. But there is no guarantee that all terminals support this non standard behavior.
Another kludge is proposed in the revision of H.225, basically allowing Q.931 to carry a session description. But then one will have to solve te interaction between version 2 and version 1, which is made hard by the next problem, the sheer complexity of the spec.
Connection set-up in H.323 requires three successive exchanges, for H.225/RAS, H.225/Q.931 and H.245. None of these protocols is simple. H.225 includes 15 pages of ASN.1, plus inclusion by reference of the Q.931 specifications. H.245 includes no less than 60 pages of ASN.1. Even if you use an ASN.1 compiler (and you really have to), you are left with hundreds of parameters to set right if you want to interoperate. The chances that a programmer goofs up here or there are enormous; the problem space is so large as to make it out of the reach of most testing techniques.
Version 2 may include the faster call set-up procedure described above, but it also includes tons of extensions, in fact a whole new additional recommendation (H.450). As with many ITU standards, the behemoth merely gets fatter as it ages.
So, to conclude, I heartily support Jon's recommendation. We should proceed with "our own" and then specify interworking.

Modularity

H.323 is less modular than SIP. It defines a vertically integrated protocol suite for a single application. these are intertwined within the various subprotocols within H.323:

call initiation and termination;
media capabilities exchange;
conference control and floor control;
bandwidth and admission control;
supplementary services;
user location.

The problem is that each of these are different, and as time evolves mechanisms for doing them will evolve as well, especially for QoS. However, as they are all integrated into a single protocol, removing any one of these and using a new or separate protocol for this functionality is very difficult.

Loop Detection

H.322 rovides no easy way to perform loop detection in complex multi-domain searches Example: When you want to call me, you have to resolve some name you have for me (jdrosen@bell-labs.com) to the final IP address of the machine where I currently reside. Throw in personal mobility, and the process of locating a user becomes complex. Consider a simple example. You call jdrosen@bell-labs.com. This is first forwarded to your local gatekeeper, which does some checks and then forwards it to the gatekeeper for bell-labs. This gatekeeper looks me up in its database, and forwards the setup message to a gatekeeper run by my department. However, I'm at Columbia today, and have registered such with my departments gatekeeper. So, the request is then forwarded to the columbia gatekeeper, which then in turn forwards it to the computer science's department gatekeeper, which finally forwards it to me. So, there is actually a search going on here, with call setup messages hopping from gatekeeper to gatekeeper. Each of these gatekeepers is in a different domain (or zone in H.323 parlance), and thats the domain above. Now, if I have accidentally told the local Columbia gatekeeper to forward all calls back to bell-labs, we have created a traditional routing loop. SIP can detect such loops. H.323 cannot.

Last modified: 1998-01-24 by Henning Schulzrinne