Session announcements are transmitted as SDP packets. SDP is a simple protocol, defined as ASCII text lines describing the announcement, separated by a CR LF. These lines all have the same format:
, where x is any letter which defines the attribute to be defined (case sensitive) and value is the value given to the attribute. Several attributes are defined by SDP so far. The following is a SDP packet arbitrarily catched from the INTERNET:
x=value
This encoding technique is quite simple to implement, because a parser for recognizing a SDP packet is very simple. But the format is very inefficient and waste a lot of bandwidth. This is a problem on slow-speed links like PPP links on normal telephone lines. Because of this the SDP packet should be compressed. The SAP protocol which is normally used on the INTERNET to carry SDP packets, defines a way to simply compress the SDP packet by using the gzip algorithm. The following document describes a better technique, which utilizes a better compression ratio then simply using the gzip algorithm. The resulting protocol will be called compressed SDP (CSDP).
v=0 o=james 3054298051 3054298223 IN IP4 129.89.142.50 s=FreeBSD Lounge i=Channel to discuss FreeBSD related issues. Please keep video bandwidth below 64 kbps. e=Jim Lowe <james@cs.uwm.edu> p=Jim Lowe (414) 229-6634 c=IN IP4 224.2.100.100/127 t=0 0 a=tool:sdr v2.2a23 a=type:test m=audio 16400 RTP/AVP 0 c=IN IP4 224.2.100.100/127 a=ptime:40 m=video 49200 RTP/AVP 31 c=IN IP4 224.2.100.102/127 m=whiteboard 32800 udp wb c=IN IP4 224.2.100.101/127 a=orient:portrait
There are two groups of attributes which can be repeated several times, the Time Description and the Media Description. The first attribute of each group is mandatory, if the group is present. So these groups will also be repeated in the compressed SDP packet. The last (or even missing) group will be indicated by the first attribute. If it is not present in the compressed packet (violation of the SDP specification), then this group will no longer repeated.
If the attribute occurs two times, then the following sequence will be generated:
+-+-+-+-+-+ ... +-+-+-+ |1| the attribute |0| +-+-+-+-+ ... +-+-+-+-+
According to the SDP specification, there are some fields which are mandatory and have to be specified exactly once. So the usage of the <presence-bit> is not needed for such attributes. But to simplify the compression / decompression, and to make the algorithm more clear, every attribute will be encapsulated by the <presence-bit>. This makes it also possible to extend the SDP spec, so that such attribute can be repeated, of skipped. The overhead of two bits (1 and 0) can be accepted.
+-+-+-+-+-+ ... +-+-+-+-+-+-+-+ ... +-+-+-+ |1| the attribute |1| the attribute |0| +-+-+-+-+ ... +-+-+-+-+-+-+-+ ... +-+-+-+-+
There is one exception regarding the usage of the <presence-bit>. The very first attribute, the version of the SDP packet, will NOT be encapsulated. This is to let us define new versions of SDP or this compression protocol, which uses a fully different encapsulation. Therefore, we need the version information in plain.
Tests have shown that, when the compressed header information for each attribute (e.g. the <presence-bit> for the i= attribute) were are directly followed by the text value (the textual description), the gzip algorithm works very poor. Because of this, the compressed SDP packet will be splited in four sections.
The version section simply defines the SDP and CSDP version. The header-length section defines the number of bytes occupied by the version, the header-length and the header. The header section contains all the compressed SDP data, excluding the compressed text data. The last section contains the sequential appended text, which is compressed by gzip (the whole collected text-buffer will be compressed, not each attribute separately!).
+----------------------+ | version | +----------------------+ | header-length | +----------------------+ | header | +----------------------+ | compressed text data | +----------------------+
The last section begins on a byte boundary. Because the compression is bit oriented, the last octet of the header section may be padded to the byte boundary. This can be done by appending 0 to 7 <final-bits>.
Theoretically, the length field can be ommited, because the end of the header can be detected automatically. The last Media Description group defines the end of the SDP packet, so the last byte is the one with the 0 presence-bit for the group. But during trying to implement the CSDP decompressing algorithm, we found out that it is a difficult to handle the text sections. Without the length indication, the parser has to parse the packet twice, resulting in a poor performance. So we decided to include the length indication.
For every length filed defined below, a value with all bits set to 1 is reserved to indicate a length extension. This allows us to efficient compress attributes with a usual length, and makes it still possible to encode larger attributes. The format and usage of the extension mechanism has not been defined up to now.
Each attribute but the version one, is preceded by a <presence-bit> as defined above. It is indicated by a P-bit.
Ver is the binary encoded version field. Currently, the value 00 is defined for the actual version 0, and the description given here applies only to this version!
0 1 +-+-+ |ver| +-+-+
<headler-length> is the binary encoded length of the header, given int octets. The value includes the overall header, including the <version> and <header-length> field, so it an offset, to the point where the compressed text section begins.
0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ |<header-length>| +-+-+-+-+-+-+-+-+
u-len specifies the length of the username in octets and is 6 bits long. Because the the <username> is raw ASCII, is will be appended to the text section to be compressed. Because in practice, the <session-id> and <version> fields seems to be derived from an 32 bit integer, it will be represented in binary version. The address-triple will be encoded as defined in [comprNetAddrAddr].
3 3 3 3 0 0 1 2 3 0 1 2 3 4 0 1 0 1 2 3 0 1 +-+ +-+-+-+-+ +-+-+-+-+ ... +-+-+-+-+ +-+-+-+-+ ... +-+-+-+ |P| | u-len | | 32 bit <session-id> | | 32 bit <version> | +-+ +-+-+-+-+ +-+-+-+ ... +-+-+-+-+-+ +-+-+-+ ... +-+-+-+-+ 0 1 2 3 4 5 +-+-+-+-+-+-+ ... +-+-+-+-+-+-+-+ | <net-type> <addr-type> <addr> | +-+-+-+-+-+ ... +-+-+-+-+-+-+-+-+
The 6 bit len field specified the length of the session name in octets. The <session-name> will be appended to the text section to be compressed.0 0 1 2 3 4 5 +-+ +-+-+-+-+-+-+ |P| | 6 bit len | +-+ +-+-+-+-+-+-+
The 10 bit length field specifies the length of the description in octets. It is possible to specify a length of up to 1023 octets, which is enough because of the length restriction of an SDP packet to 1024 octets (well, the packet "should" not be larger than 1024). The <session-description> will be appended to the text section to be compressed.0 0 1 2 3 4 5 6 7 8 9 +-+ +-+-+-+-+-+-+-+-+-+-+ |P| | 10 bit length | +-+ +-+-+-+-+-+-+-+-+-+-+
The 8 bit length field specifies the length of the <uri> in octets. The <uri> field will be appended to the text section to be compressed.0 0 1 2 3 4 5 6 7 +-+ +-+-+-+-+-+-+-+-+ |P| | 8 bit length | +-+ +-+-+-+-+-+-+-+-+
The 8 bit length field specifies the length of the <e-mail> address in octets. The <e-mail> address will be appended to the text section to be compressed.0 0 1 2 3 4 5 6 7 +-+ +-+-+-+-+-+-+-+-+ |P| | 8 bit length | +-+ +-+-+-+-+-+-+-+-+
The 8 bit length field specifies the length of the <phone-number> in octets. The <phone-number> will be appended to the text section to be compressed.0 0 1 2 3 4 5 6 7 +-+ +-+-+-+-+-+-+-+-+ |P| | length | +-+ +-+-+-+-+-+-+-+-+
The address-triple (<net-type>, <addr-type> and <addr>) will be encoded as defined in [comprNetAddrAddr]. The 8 bit <ttl> field is the binary encoded <ttl>.0 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 +-+ +-+-+-+-+-+-+-+-+ ... +-+-+-+-+-+ +-+-+-+-+-+-+-+-+ |P| | <net-type> <addr-type> <addr> | | 8 bit <ttl> | +-+ +-+-+-+-+-+-+-+ ... +-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+
The 3 bit <mod> is encoded as following:1 0 0 1 2 3 0 1 2 3 4 5 6 7 8 9 +-+ +-+-+-+ +-+-+-+-+-+-+-+-+-+-+ |P| |<mod>| | 10 bit <bandwidth>| +-+ +-+-+-+ +-+-+-+-+-+-+-+-+-+-+
The <bandwidth> will be encoded as 10 bit binary value.000 CT 001 AS 111 (reserved for extension)
The 27 bit start-time is the binary encoded value of the <start-time>, given in minutes. The 27 bit stop-time is the binary encoded value of the <stop-time>, given in minutes.6 6 0 0 1 2 3 4 5 2 0 1 2 3 4 5 2 +-+ +-+-+-+-+-+-+ ... +-+-+ +-+-+-+-+-+-+ ... +-+-+ |P| | 27 bit start-time | | 27 bit stop-time | +-+ +-+-+-+-+-+ ... +-+-+-+ +-+-+-+-+-+ ... +-+-+-+
Because these attributes are represented as NTP values, there should be no limitation by using the 27 bit binary encoding, given in minutes.
Well it can happen that the values have to be rounded. In fact, this does not affect the real meaning. Anyway, we define that the <start-time> has to be rounded down, and the <stop-time> has to be rounded up.
The 20 bit interval is the binary encoded value of <repeat-interval>, given in minutes. This largest value is larger than 1 year, so 20 bit should be sufficient.4 1 0 0 1 2 3 4 5 2 0 1 2 3 4 5 5 +-+ +-+-+-+-+-+-+ ... +-+ +-+-+-+-+-+-+ ... +-+-+ |P| | 20 bit interval | | 11 bit duration | +-+ +-+-+-+-+-+ ... +-+-+ +-+-+-+-+-+ ... +-+-+-+
The 11 bit duration is the binary encoded value of <active-duration>, given in minutes. The largest value is about 34 hours (longer than one day), which should be sufficient.
The variable length list-of-offsets is encoded like an attribute, i.e. it is preceded by the presence-bit. This field is repeated until a presence-bit of 0 is received.
The 20 bit offset is the binary encoded value of one <list-of-offsets> element, given in minutes. The largest value is more than a year, which should be sufficient.4 0 0 1 2 3 4 5 2 +-+ +-+-+-+-+-+-+ ... +-+ |P| | 20 bit offset | +-+ +-+-+-+-+-+ ... +-+-+
It can happen that the values were truncated, which doesn't have any real meaning. Anyway, the <repeat-interval> should be rounded down, the <active-duration> up and the <offset> down.
The 20 bit adjust time is the binary encoded value of the <adjust-time> field, given in minutes. The 8 bit offset is the binary encoded value of the <offset> field, given in minutes! So we can have an adjustment of up to 4 hours; should be far enough!4 0 0 1 2 3 4 5 2 0 1 2 3 4 5 6 7 +-+ +-+-+-+-+-+-+ ... +-+-+ +-+-+-+-+-+-+-+-+ |P| | 20 bit adjust time | | 8 bit offset | +-+ +-+-+-+-+-+ ... +-+-+-+ +-+-+-+-+-+-+-+-+
It can happen that the values were truncated, which doesn't have any real meaning. Anyway, the <adjust-time> should be rounded down and the <offset> up.
B.4.14. Encryption Key (k=)
(currently not defined)
The 6 bit length field specifies the length of the remaining attribute definition, either the <flag> or the <attrib>:<value>. These fields will be appended to the text section to be compressed.0 0 1 2 3 4 5 +-+ +-+-+-+-+-+-+ |P| | length | +-+ +-+-+-+-+-+-+
The <media> field will be encoded as an 3 bit binary value. The following values have been defined:1 0 0 1 2 0 1 5 0 1 2 +-+ +-+-+-+ +-+-+-+-+-+-+ ... +-+-+ +-+-+-+ |P| |media| | 16-bit port | |trans| +-+ +-+-+-+ +-+-+-+-+-+ ... +-+-+-+ +-+-+-+
The 16 bit port field is the binary encoded value of <port>.000 audio 001 video 010 whiteboard 011 html 100 text 111 (reserved for extensions)
NR IS MISSING!!!
The next field is the transfer format, and will be encoded as 3 bit value. The following values have been defined:
Because the <fmt-list> is a variable length list, it will be encoded as present-bit, value. If the present-bit is 0, there are no more fmt entries. Otherwise, a media specific 5 bit encoded fmt description will follow. For the media type audio, the following values are defined:000 RTP/AVP 001 VAT 010 UDP 111 (reserved for extensions)
For the media type video, the following values are defined:00001 0 (pcm) ... 00010 pcm 00011 gsm 00100 dvi4 0 0001 pcm 0010 gsm 11111 (reserved for extension)
For the media type whiteboard, the following values are defined:00000 h261, 31 00001 ???, 96 11111 (reserved for extension)
For the media type html, the following values are defined:00000 none 00001 wb 11111 (reserved for extension)
For the media type text, the following values are defined:00000 Mosaic 00001 Netscape 11111 (reserved for extension)
00000 nt 11111 (reserved for extension)
The compressed version consists of a fix 4 bit header which defines the combined type of the address format. The following formats has been defined up to now:
1 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+- ... -+-+ |0 0 0 0| 32 bit binary IP4 address | +-+-+-+-+-+-+-+-+-+-+-+-+-+- ... -+-+-+
1 5 0 1 2 3 4 5 6 7 8 9 0 1 2 3 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+- ... -+-+ |0 0 0 1| 48 bit binary IP6 address | +-+-+-+-+-+-+-+-+-+-+-+-+-+- ... -+-+-+
Length (6 bits) specifies the length of the DNS-address in octets (i.e. 8 bit units). Addresses longer than 64 octets cannot be represented by this format.1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+- ... -+-+-+-+-+-+-+-+-+ |0 0 1 0| length | length octets of DNS-address | +-+-+-+-+-+-+-+-+-+-+-+-+-+- ... -+-+-+-+-+-+-+-+-+-+
0 1 2 3 +-+-+-+-+ |1 1 1 1| +-+-+-+-+
The following is a graph, which compares the size of the compressed SDP packet, once compressed by gzip and once with the method described here:Nr SDP/bytes gzip'd ratio/\% CSDP ratio/\% 19 178 157 11.7 86 51.6 04 274 210 23.3 158 42.3 20 336 245 27.0 194 42.2 16 356 260 26.9 195 45.2 09 371 247 33.4 201 45.8 18 378 290 23.2 217 42.5 01 381 286 24.9 228 40.1 14 381 298 21.7 230 39.6 05 406 293 27.8 213 47.5 03 408 294 27.9 231 43.3 24 416 307 26.2 249 40.1 07 458 304 33.6 248 45.8 06 459 309 32.6 255 44.4 21 466 309 33.6 238 48.9 23 505 340 32.6 245 51.4 15 505 341 32.4 245 51.4 17 505 341 32.4 245 51.4 11 518 354 31.6 291 43.8 22 531 342 35.5 250 52.9 10 565 380 32.7 316 44.0 13 644 429 33.3 351 45.4 08 647 394 39.1 305 52.8 12 665 447 32.7 356 46.4 02 744 435 41.5 372 50.0 25 834 524 38.3 399 52.1 00 837 517 38.2 360 56.9
You can see very clear that the CSDP compression is much more effective than simply applying gzip to the SDP packet, as expexected by the author. BTW, it is interesting that the two plots have nearly the same shape, but they the CSDP plot is lowered by an offset compared to gzip.
Another interesting comparison is the compression ratio using the two different compression tachniques:
Once again you can see that the shape is nearly the same. But the major result is, that the CSDP method has a compression ratio of approximately 40% to 55 %, compared to 15% to 40% when using gzip. The arithmetic mean of the gziped SDP packet is 30.5%. It is 46.8% when using the CSDP compression.
But there are also some kind of pitfalls. First, the compression algorithm shown here strictly relies on the order of the SDP lines. While implementing the algorithm explained above, I realized that there are still a lot of SDP packets on the net, which do not strictly care about the order, even sdr. So prior to compression, a reordering of the fields was necessary. Doing so results in correctly ordered SDP packets, so that there was no packet, which couldn't be compressed.
Of course, extending the SDP (e.g. adding new tags) will cause a problem. But it should be easy to adapt such changes to CSDP as well.