Chin-Hong Lin
Columbia University
New York, NY 10025, USA
cl486@columbia.edu
Agung Suyono
Columbia University
New York, NY 10025, USA
as1682@columbia.edu
Abstract
Session Initiation Protocol (SIP) is a signaling protocol that handles
the setup, modification, and
tear down of multimedia sessions. After the session is established,
the real-time multimedia data
(audio and video) are carried over Real-time Transport Protocol (RTP)
where the data itself is
encoded using audio/video codec, like G.711, GSM, and ADPCM for audio,
and H.261, H.263,
and MPEG for video. In this project, we utilize an existing
centralized conferencing server sipconf
that allows multiple SIP users to participate in an audio video conference
and extend the server's
functionality to allow recording of an ongoing conference. Our
first goal is to record the conference
in three different formats, namely AU, WAV, or rtpdump.
Real Time Streaming Protocol (RTSP) is becoming a standard of network
remote control for
multimedia servers that have the multimedia contents (audio/video).
RTSP is used between the
client and server to negotiate the initiation and termination of a
streaming session. After the session
is established, the media server sends the multimedia packets to the
requesting client using RTP.
Our next step is to enhance the functionality of RTSP media server
rtspd to support recording in
rtpdump format where AU or WAV format recording is already supported.
At last, we enhance the usability of the system where it can be controlled
from the web interface and
can use the existing database for configuration and control.
This allows the system to retrieve
information dedicated for conference recording without cumbersome command-line
options specified
during server startup.
In the implementation of the SIP conferencing server, sipconf will create
a thread for each participant
to listen or receive the RTP/RTCP packets from this participant where
the receive thread is denoted as
RTPReceiveThread. After a RTP packet is received, sipconf will
decode the packet media payload into
Linear and put the result into a centralized buffer. Packets
from all the participants in one conference
are accumulated in this centralized buffer and thus mixed.
The conferencing server at the meantime will create another send thread
(RTPSendThread in the
implementation) for each conference which gets the mixed stream periodically
(e.g., every 40 ms) and
transmits the mixed result back to the participants. The mixing
is needed only for audio where the video
streams can be replicated without modification. In addition,
the server has to remove the audio data
belonging to the participant from the mixed stream to whom the server
is going to send before sending
the mixed result to participants.The way we do this is to ensure the
participant will not hear his/her own
voice.
For recording purpose, we must record all the media contents from each
participant to the file according
to the desired file format. If AU or WAV is chosen, then the
server will write the mixed audio stream
into the target file. If rtpdump format is chosen, the packets
will be dumped to file as soon as they
arrive before mixing.
The organization of the report is as follows:
1. Introduction
2. AU and WAV format recording
3. rtpdump format recording
4. rtspd enhancements
5. Enhanced control
6. Program documentation
7. Task List
8. References
2. AU and WAV Format Recording
Before we step into details about recording, the following are some
introduction for these two sound file
format.
AU file format
This format is developed by SUN and serves as a standard for UNIX computers.
The 24 byte header can be described by the following C structure.
typedef struct {
u_int32 magic;
/* magic number */
u_int32 hdr_size; /* size
of this header, with info (in bytes) */
u_int32 data_size; /* length of
data (optional) */
u_int32 encoding; /* data
encoding format */
u_int32 sample_rate; /* samples per second
*/
u_int32 channels; /* number
of interleaved channels */
} audio_filehdr_t;
WAV file format
This file format follows the RIFF (Resource Information File Format)
specification. It was developed by
IBM and Microsoft as a counterpart of AIFF on Macs. It is the
native sound format of Windows
machines for recording and playback of recorded sound.
WAV files can be written with multiple data chunks. For this project,
we write WAV files with a single
data chunk.
typedef struct {
char magic[4];
/*= "RIFF"; magic constant */
u_int32 length;
/* total length of file - 8 */
char type[4];
/* = "WAVE"; designates as WAVE file */
struct {
char type[4];
/* = "fmt "; type of chunk */
u_int32 length;
/* length in bytes */
u_int16 wFormatTag;
/* data format */
u_int16 wChannels;
/* number of channels */
u_int32 wSamplesPerSec;
/* samples per second per channel */
u_int32 wAvgBytesPerSec;
/* estimate of bytes per second */
u_int16 wBlockAlign;
/* byte alignment of a basic sample */
u_int16 wBitsPerSample;
/* bits per sample */
} fmt_chunk;
struct {
char type[4];
/* = "data"; type of chunk */
u_int32 length;
/* length of the data (chunk size minus (-) 8 bytes */
} data_chunk;
} wave_filehdr_t;
Sound Coding Technique
For these two file format in our implementation, we will use G.711 Mu
Law codec with 8bits per
sample at 8000 Hz sampling rate. Mu law is a variant of G.711,
which is used primarily in North
America. The other variant is A-law; the difference between the
two is the manner in which
non-uniform quantization is performed. G.711 is a waveform codec
and is often called
Pulse-Code Modulation (PCM).
Procedures to do conference recording
Where do we put the functionality in Conferencing Server sipconf?
The recording for AU and WAV is done in send thread which is the thread
the conferencing server
creates for each conference. The thread is denoted as RTPSendThread
in cinema/libmixer/sendrecv.c.
Before we can proceed the recording, the necessary step is to initialize
the file header for the sound file.
The function we call is FILE *CreateSoundFile(char *filename, encoding_used
sndformat); in the
parameter sndformat, we can specify the information dedicated for the
desired file format.
Since the server will mix the samples from all participants into a mixed
stream and send the mixed
results (exclude the participant's own samples) back to each participant
periodically. Thus, at the end
of each interval when mixing is completed and before subtracting participants'
own samples is the
appropriate time to write the mixed results to the files (AU or WAV).
Before the actual writing is taking
place, since the mixed audio stream is 16-bit linear, we need to transcode
the mixed results to the
encoding format used in the sound file. Please see sound coding
technique for our implementation.
Since there is a file-size field in the file header, before we close
the sound file, we need to specify this
value. For this purpose, when we do the recording, we also keep
a record of data size being written
to the file which is accumulated throughout the lifetime of the conference.
Once the conference is
terminated, we call the function void CloseSoundFile(FILE *FN, u_int32
data_size) with calculated
total file size to close the file. At this point, the conference
recording for AU or WAV is done.
3. Recording Using rtpdump Format
Where do we put the functionality in Conferencing Server sipconf?
The recording for rtpdump is done in receive thread which is the thread
the server sipconf creates for
each participant in the conference. The thread is denoted as
RTPReceiveThread in
cinema/libmixer/sendrecv.c.
Before we can do the recording for rtpdump format, we also need to initialize
the file header. The
function we call is void rtpdump_header(FILE *out, struct sockaddr_in
*sin, struct timeval *start); in
the parameter list,sin is the socket address of the mixer (in this
case, sipconf), and start is the time
stamp for the start of the conference.
The rtpdump header described using C structure is as follows:
typedef struct {
struct timeval start; /* start of recording
(GMT) */
u_int32 source; /*
network source (multicast address)--in our case, sipconf */
u_int16 port;
/* UDP port --in our case, sipconf*/
} RD_hdr_t;
Since there are multiple receive threads for a conference with multiple
participants, we should have only
one rtpdump file for each conference. When the packet is receive
by the server, it will be dumped to
the file via calling the function void packet_handler(FILE *out, int
trunc, double dstart, struct timeval
now, int ctrl, struct sockaddr_in sin, int len, RD_buffer_t packet).
In the parameter list, trunc specifies
the max size in RTP/RTCP packets in case the packet size is too large;
dstart is the time stamp when the
conference begins; now is the current time stamp to accommodate delay
jitter; ctrl is the control parameter
to control actual recording, 0 for RTP and 1 for RTCP; sin is the socket
address for the participant
sending this packet; packet is the received RTP/RTCP packet.
The packets will be written to file chronologically so the value of
time stamps in the file will increase
accordingly. Since the file also contains timing information
from the RTP header, the recording can be
play back later with the same timing effects. For this format,
we do not manipulate the RTP packet's
content.
In this part, we extend the functionality of an RTSP media server, rtspd,
which already supports
playback and recording of G.711 Mu law audio. It can record using
AU format.
We modify the rtspd so that it can support for rtpdump. We utilize
the existing rtpdump functionality
from http://www.cs.columbia.edu/~hgs/software/rtptools/src/ to read
and write rtpdump file. The file
we modify is rtpfile.c, and we add a file to support rtpdump utility
function in rf.c and rf.h.
Since rtpdump format is different between RTP and RTCP packet, we differentiated
between them
for the incoming buffer by recognizing the payload type in the packet
header; if payload type is
between 200 to 204 inclusive, it is RTCP, otherwise is RTP.
The last part of our project is to control recording mechanism without
command line options. Our goal
is to use web interface and existing database for configuration and
control in this regard. The database
'sip' contains the table 'conferences' which has all information for
conferences. The column
'recordingformat' can be configured with the desired recording file
format, e.g., AU, WAV, or rtpdump.
The conference owner can control this feature through web interface.
When the user login to CINEMA web interface, after pressing the 'conference'
button, it will show the
conference list. In the list, after pressing 'Edit' besides the
conference url, the web script ConfEdit.cgi
will be invoked. The web page then shows all the editable fields
in the table 'conferences' where the
conference owner can specify the recording format. After the
edit button on this page is pressed, all
the information will be updated to sql database to reflect the changes.
The conferencing server at this time can retrieve the information about
recording format for each
conference in the database. If no format is specified, the recording
will not take place. We achieve
this by inserting embedded SQL commands in cinema/libmixer/sendrecv.c
with the help from
cinema/libdb++/dbapi.h. The file naming convention is conf_ID.recordingformat,
where ID is the
primary key for each conference and recordingformat is AU, WAV, or
rtpdump.
Here we explain step by step how to run and test the modified sipconf.
We will introduce two kinds of
environment: Linux and SunOS. First we need to unpack the file
confrec.tar.gz. It will create two
directories: libmixer and rtspd, under a directory confrec.
$ gunzip confrec.tar.gz
$ tar xvf confrec.tar
$ ls confrec/libmixer
module.mk sendrecv.c sendrecv.h sndfile.c sndfile.h
$ Ls confrec/rtspd
module.mk rtpfile.c rtpfile.h rf.c rf.h
Copy the files under confrec/libmixer to cinema/libmixer, and the files
under confrec/rtspd to cinema/rtspd.
Linux
--------
In the cinema directory, create a directory for sipconf installation.
For example, in this test we name
it 'linux-sipconf'. Configure it in order to be compatible with
the linux platform, and compile the sipconf.
$ pwd
cinema/
$ mkdir linux-sipconf
$ cd linux-sipconf
$ ../configure --with-rtp=/proj/irt-gc4/rtp/Linux --with-mysql=/proj/irt-gc4/mysql/Linux
$ make sipconf
After compilation finishes, run the sipconf with the following:
$ CD sipconf
$ ./sipconf -d -X -D SQL://root:tiger@marta.cs.columbia.edu/sip
Make sure that you indicate the path to necessary library to run the
sipconf, i.e. make sure you put
the following line in the file $HOME/.profile
export PATH=/proj/irt-gc4/mysql/Linux/lib/mysql:$PATH
SunOS
---------
Assume we run the programs on marta.cs.columbia.edu machine.
Under cinema directory, create
a working directory to install sipconf, e.g., sun-sipconf.
$ pwd
cinema/
$ mkdir sun-sipconf
$ CD sun-sipconf
$ ../configure --with-rtp=/proj/irt-gc4/rtp/SunOS --with-mysql=/proj/irt-gc4/mysql/SunOS
$ make sipconf
$ CD sipconf
$ ./sipconf -d -X -D SQL://root:tiger@marta.cs.columbia.edu/sip
Also make sure that you indicate the path to necessary library:
export LD_LIBRARY_PATH=/proj/irt-gc4/mysql/SunOS/lib/mysql:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/opt/CUCSgnu/lib:$LD_LIBRARY_PATH
Testing Tools
SIP Client
This program represents a participant for a conference, i.e., the client
for the SIP conferencing server.
For Linux machine, we may use sipc from /proj/irt-gc2/irt/sipc.linux
directory. Gor to the directory,
and run the sipc.
$ CD /proj/irt-gc2/irt/sipc.linux
$ ./sipc &
sipc must be run on local machine since it need to use local audio device
as an input.
Use sipc to connect to the available conference in the sipconf server.
If sipconf is running on
marta.cs.columbia.edu machine, you may make a sip conference call to
that machine. For example, you
may specify the conference url in the sipc dialog window: sip:demo@marta.cs.columbia.edu.
Tools to Test Our Recording Results
for AU and WAV Format
----------------------------
The .wav and .au format as a result of recording can be found in the
cinema/linux-sipconf/sipconf/.
The file is named using the convention: conf_ID.au and conf_ID.wav,
where ID is the conference id.
You can use a Window Media Player, Real Player, etc. to playback the .wav and Au files.
for rtpdump Format
----------------------
In cinema/sun-sipconf/sipconf/, we will find the rtpdump file.
The file naming convention is conf_ID.rtp.
Assume that we have already run rat application in the vienna.cs.columbia.edu
machine (rat will
automatically run if we start sipc). Assume that rtpplay has
been already installed in the machine. From
another machine, for example, baghdad.clic.cs.columbia.edu, run the
rtpplay:
$ rtpplay -v -T -f /$WORKDIR/cinema/linux-sipconf/sipconf/conf_ID.rtp vienna/10000
You may hear the recording result from vienna.clic.cs.columbia.edu machine.
If the rtpplay has not yet been installed in the machine, download
and install the file from
http://www.cs.columbia.edu/~hgs/software/rtptools/src/
rtspd Installation
-------------------
From the unpacked confrec.tar.gz file above, copy confrec/rtspd to
cinema/rtspd. Make a working
directory under cinema for rtspd installation, and compile the rtspd.
$ pwd
cinema/
$ mkdir linux-rtspd
$ CD linux-rtspd
$ ../configure
$ make rtspd
Student name: Chin-Hong Lin
Tasks: AU and WAV recording for sipconf
rtpdump file creation and rtpdump format recording for sipconf
Web cgi scripts and database modification for enhanced control
Student name: Agung Suyono
Tasks: AU and WAV file creation and closing for sipconf
rtspd enhancement for rtpdump recording
Embedded SQL query in sipconf for enhanced control
1 Jon Crowcroft, Internetworking Multimedia, Morgan
Kaufmann Publishers, CA, 1999
2 RFC
for RTP/RTCP
3 RFC for
SIP
4 RFC
for RTSP
5 Kundan
Singh, Gautam Nair and Henning Schulzrinne, "Centralized Conferencing
using SIP". Proceedings of the 2nd IP-Telephony Workshop (IPTel'2001),
April 2001.