Abstract

The Purpose of this report is to illustrate the VoIP project using Sharp Zaurus on SIPUA based on CINEMA platform.

1. Introduction

Zaurus VoIP project is a voice over IP (VoIP) application on Sharp Zaurus Linux based PDA. It allows user to use Zaurus as a phone, to make voice phone calls based on Columbia University’s CINEMA project architecture.

The project uses Session Initiation Protocol and Real Time Protocol. Co-authored by Professor Henning Schulzrinne of Columbia University, SIP is a signaling protocol for establishing real-time, interactive communications sessions over IP networks. It is capable of seamlessly integrating of a wide variety of communications services using Internet protocols.

2. VoIP Architecture

While convergence of voice and data is becoming a feasible solution in communication network, Voice over Internet Protocol (VoIP) plays a significant role. With the entrance of Cisco and other data networking companies to this territory, many traditional switch vendors have restructured their product line focusing on the software-based solutions based upon converged architectures.

VoIP doesn't use TCP because it is too heavy for real time applications, so instead a UDP (datagram) is used.

2.1 CINEMA Project Overview (from CINEMA website)

The Columbia InterNet Extensible Multimedia Architecture (CINEMA) consists of a set of SIP-based servers that provide a pathway to a post-PBX era of communications. It provides a comprehensive environment for creating and deploying rich Internet multimedia services including programmable Internet telephony services, audio/video conferencing, IP-based voice mail, and unified messaging.

2.2 RTP protocol

In the Zaurus project, the SIPUA is built using RTP protocol to provide a global Sequencing number and time stamp for each packet.

The RTP protocol enables end-to-end real-time delivery services for data. Sample services provided by RTP are the following:

· Sequencing: Since the packet may not arrive in the correct order (UDP), and we need audio data packet arrive it in write order, RTP which runs on top of a datagram-based service, packet sequence numbers are required to reassemble the information stream in the destination. This is utilized in the SIPUA.

· Time stamping: Time stamping in the packets to assure synchronization of the devices. This time stamping is also important in control and monitoring applications.

· Delivery monitoring: Allows the participating applications to collect and share statistics about the performance of the data transport service.

· Information about connection: the statistics, the names and locations of the parties, when and why a party is leaving, etc.

Delivery monitoring and connection information gathering services are provided by RTCP, the companion control protocol for RTP. In my Zaurus sipua project, only RTP was supported.

2.3 SIP protocol

SIPUA utilizes SIP protocol to provide call setup, routing, authentication and other features to end devices. SIP is a text-based protocol allows for the scalable and extensible implementation of a wide variety of applications including audio, video, chat, instant messaging and whiteboard. SIP requires less overhead comparing with the traditional H.323 protocol.

2.4 Zaurus PDA

Zaurus is the Linux-based PDA produced by sharp. With an Intel StrongARM 206MHz processor, the Zaurus SL-5500 provides users with powerful multimedia and multi-processing capabilities. In addition, MP3 and MPEG 1 capabilities allow users to listen to music or view video clips.

2.5 Motivation

Sipua is a sip user agent. sipua allows end user to make and receive SIP calls.

Zaurus sipua implementation should support the original sipua features such as SIP version 2.0 (RFC 2543, RFC 3261), G.711 Mu Law audio for Solaris on SPARC and rudimentary RTP implementation, etc. In addition, it has to supports AFMT_S16_LE format audio for Linux and embedded-Linux. Especially, it has to support the Zaurus’s audio device whose inputs is only mono. The embedded Linux operating system on Zaurus challenges the adaptability and extensibility.

3. Project design and implementation

3.1 Zaurus SIPUA VoIP Fundamentals

1. Convert analog voice to digital signals (bits)

2. Compressed the bits into a good format for transmission: PCMU, AFMT_S16_LE.

3. Here we have to insert our voice packets in data packets using a real-time protocol (RTP over UDP over IP)

4. SIP performs as singling protocol to users.

5. Receivers have to disassemble packets, extract data, then convert them to analog voice signals and send them to audio device.

6. All that must be done in a near real time fashion cause we cannot waiting a long time during talking

3.2 Zaurus Analog to Digital Conversion

The Zaurus sound card allows you convert with 16 bit a band of 22050 Hz (for sampling it you need a freq of 8000 Hz for Nyquist Principle) obtaining a throughput of 2 * 8000 (samples per second) = 16000 Bytes/s.

For each 20 ms, we have 16000/50 = 320 bytes. So the default packet size was set to 320 bytes.

Now that we have digital data we may convert it to a standard format that could be quickly transmitted.

1)     PCM, Pulse Code Modulation, Standard ITU-T G.711

2)     AFMT_S16_LE

For this Zaurus project, I only implemented for AFMT_S16_LE format that was suggested having a better performance on Zaurus.

3.3 RTP Real Time Transport Protocol

Next step is to converting the raw data and encapsulating it into TCP/IP stack follows the structure:

VoIP data packets

RTP

UDP

IP

       I,II layers

VoIP data packets live in RTP (Real-Time Transport Protocol) packets that are inside UDP-IP packets. RTP enables the receiver to put the packets back into the correct order and not wait too long for packets that have either lost their way or are taking too long to arrive.

3.4 Zaurus Networking

Unlike Unix and other Linux products, Zaurus doesn’t automatically assign the IP address to its hostname. Since the project is based on Wi-Fi, DHCP will assign a new IP address to Zaurus each time it connects. This created some confusions and delays in the project since SIPUA uses hostname to find its peers. The solution is to use hostname command to assign its IP address to Zaurus each time it obtains a new one. An improvement on SIPUA will be to identify user’s IP address instead of simply using the hostname.

VoIP connecting through Wi-Fi also depends on the stability of the wireless LAN. During the project, we have found if the wireless connection is disrupted, SIPUA has to be reconfigured (hostname setup) and restarted. An improvement in SIPUA will be to verify the LAN connection (Such as using a heartbeat mechanism) during a SIP session call. If the connection is lost, then SIPUA goes to sleep and retry the connection in some intervals.

3.4 SIPUA implementation on Zaurus

Figure 1

3.5 Design of the project

Use Zaurus to Linux Laptop as an example in Figure 1.

1. SIPUA (Both form Linux or Zaurus) started and initialized. User login and authentication.

Set userid yg97@cs.columbia.edu

Set password xxxx

2. From Zaurus, initiate invite command to Linux ‘s SIPUA using the it’s DHCP’s IP information such as the following:

invite from=sip:yg97@cs.Columbia.edu sip:128.59.xx.xx

3. The Linux SIPUA, already started, now can accept the invitation

4. The audio sessions are both side are initialized, the play thread and record thread on each side were spawned.

5. The SIP session phone is established. The voices from each side are compressed into the data and transported to the counter party’s sipua using RTP protocol and UDP packet(can be configured using TCP).

3.6 Integration with CINEMA’s programming environment

The Trolltech’s Qtopia Qt environment fits well the strongARM-Linux development environment. So I integrated the Qt’s programming environment with the CINEMA projects by modifying the Makefile in the sipua directory and include the necessary libraries from the QT’s strongARM-Linux environment.

4. Software architecture of the Audio play

In radio.cpp class

Figure 2

4.1 Setup the audio devices

Unlike the audio devices for Sun Unix and Linux machines, the Zaurus supports only mono inputs. That is we have to open /dev/dsp to play and open /dev/dsp1 to record. The details about setup information can be found in sipua’s radio.cpp file.

ThreadParam object contains in and out socket responsible for each threads communication with others. It also contains the audio device names opened by the start audio tool function. By contrasting with the standard Unix and Linux platform, the Zaurus’s SIPUA has to open different audio devices for play and record.

4.2 RingBuffer overflow

Figure 3

Algorithm for RingBuffer scheme

1) Playthread initialized start to put the packet on the buffer.

2) RecordThread initialized and start to get the packet and play

3) Both thread are working on the circular buffer to write and read packet

Issue with the RingBuffer:

When there are large amount packets continually in transporting, we can observe the buffer overflow. Since there are a time delay between the PlayThread and RecordThread, the buffer overflow will enlarge the GAP between the PlayThread and the RecordThread. Then the Playthread will catch up with the RecordThread on the RingBuffer. Unexpected exception will occur under this scenario such as Thread lock, and voice data loss, etc.

Solution:

1) Enlarge the MAX_PLAYOUT_BUFFER_SIZE will delay the buffer overflow, but eventually over flow will happen.

2) By suppressing the silence, we can reduce the unnecessary traffic significantly. Because it only sends the necessary data across the network, the delay issue is resolved. Buffer will never over flow as long as there is silence in the middle of the conversation. See next section for performance results.

4.3 Performance Testing

The Performance testing was conducted on both sides including from Zaurus to Linux and from Linux to Zaurus.

In the initial testing, the performance from Linux to Zaurus is fairly reasonable. Throughout the whole testing session (10 minutes), the delay maintains under 100 ms. The result is consistent with sipua’s behavior on other platforms. On the other hand, we have observed a linear growth in terms of voice delay on the Zaurus side, and sometimes loss of the sound. After investigation, we have found the problem and its solution described the previous section.

In the second test we conducted after applying the suppressing silence and bigger playout buffer size, the issue is resolved. On Linux side, the delay stays the same as before. On Zaurus side, during the testing session (10 minutes), the delay stays below 460 ms and doesn’t show a significant linear growth pattern.

5. Conclusion

The implementation of the SIPUA on Zaurus involves wireless networking, audio device playing and utilizes various protocols such as RTP and SIP. From the performance of the project, we can conclude that Zaurus, which is based upon embedded strongARM Linux system, can provide satisfactory result in real time networking systems.

While the delay can be acceptable, there are still some improvement can be conducted in the future work of the Zaurus SIPUA implementation

Code modified:

In RecordThread() function:

Set unsigned char buffer[512]; instead of 1500

Silence suppression have to set

// memset (&buffer[12+len],0x7f,size-len);

memset (&buffer[12+len],0x00,size-len);

// record using au_r instead of au_dev

if (oUsingDevice) {

// len = read(th->au_dev, &buffer[12], size);

len = read(au_r, &buffer[12], size);

}

In PlayThread fuction:

Set unsigned char buffer[512]; instead of 1500

In EnergyLevel: use the updated version of EnergyLevel from Sangho Shin

double EnergyLevel (unsigned char *dataBuf, int dataLen)

{

int i, n;

short data;

unsigned long sum_x, avg_x;

n= dataLen;

// Following portion of the code is copied from HW2

for (sum_x=0, i=0; i < n; i++) {

data = dataBuf[i];

data = data | (dataBuf[i+1] << 8);

sum_x += abs(data);

i++;

}

avg_x = sum_x/n;

for (sum_x=0, i=0; i < n; i++) {

data = dataBuf[i];

data = data << 8 || dataBuf[i+1];

sum_x += SQR(abs(data) - avg_x);

i++;

}

avg_x = sum_x/n;

return (avg_x);

}

Changes in start_audio_tool function, instead of using the regular audio device setup, used the folowing:

In start_audio_tool function:

/* Open audio device for Z*/

if (au_p >= 0) {

close (au_p);

}

fprintf(stderr, "**************Before opening Audio ***************...\n");

if (usedevice) {

fprintf(stderr, "opening audio device...\n");

int speed = 8000;

int channels = 1;

int format = AFMT_S16_LE;

int block_size;

int fragment = 0x00020009;

au_p = open("/dev/dsp", O_WRONLY);

if( au_p == -1) {

perror("open(\"/dev/dsp\")");

}

// this is assuming you know what the format of the wav file is.

// otherwise you'd have to read the header file first.

if( ioctl( au_p, SNDCTL_DSP_SETFMT , &format)==-1) {

perror("play setup ioctl(\"SNDCTL_DSP_SETFMT\")");

}

if( ioctl( au_p, SNDCTL_DSP_CHANNELS , &channels)==-1) {

perror("play setup ioctl(\"SNDCTL_DSP_CHANNELS\")");

}

if( ioctl( au_p, SNDCTL_DSP_SPEED , &speed)==-1) {

perror("play setup ioctl(\"SNDCTL_DSP_SPEED\")");

}

if( ioctl( au_p, SNDCTL_DSP_GETBLKSIZE , &block_size)==-1) {

perror("play setup ioctl(\"SNDCTL_DSP_GETBLKSIZE\")");

return -1;

}

if( ioctl( au_p, SNDCTL_DSP_SETFRAGMENT , &fragment)==-1) {

perror(" recording ... recording ... ioctl(\"SNDCTL_DSP_SETFRAGMENT\")");

return -1;

}

fprintf(stderr, " Opening Audio for playing part...done\n");

//## YG added for Recording...

// *NOTE* the Zaurus input is ONLY mono !!

if (au_r >= 0) {

close (au_r);

}

au_r = open("/dev/dsp1", O_RDONLY);

// *NOTE* the Zaurus has a nonstandard input

// so /dev/dsp1 must be opened when

// recording

if( au_r == -1) {

perror("open(\"/dev/dsp1\")");

}

if( ioctl( au_r, SNDCTL_DSP_SETFMT , &format)==-1) {

perror(" recording ... ioctl(\"SNDCTL_DSP_SETFMT\")");

}

if( ioctl( au_r, SNDCTL_DSP_CHANNELS , &channels)==-1) {

perror("recording ... ioctl(\"SNDCTL_DSP_CHANNELS\")");

}

if( ioctl( au_r, SNDCTL_DSP_SPEED , &speed)==-1) {

perror("recording ... ioctl(\"SNDCTL_DSP_SPEED\")");

}

if( ioctl( au_r, SNDCTL_DSP_GETBLKSIZE , &block_size)==-1) {

perror("ioctl(\"SNDCTL_DSP_GETBLKSIZE\")");

return -1;

}

fprintf(stderr, " Opening Audio for recording part...done\n");

} else

fprintf(stderr, " NOT GOING INTO Opening Audio for recording part...FALSE\n");

In setup thread part, uses au_p and au_r for play and record thread.

* Setup threads to listen to incoming packets and send outgoing

* packets

{

int e;

pthread_t tid[2];

/* Recv thread */

ThreadParam *th;

th = (ThreadParam *)calloc(1, sizeof(ThreadParam));

th->in_sock4 = in_sock4;

th->out_sock4 = out_sock4;

th->in_sock6 = in_sock6;

th->out_sock6 = out_sock6;

th->au_dev = au_p;

th->o_send = o_send; /* Don't care */

th->o_silence = 0; /* No silence detection for playout */

if ((e = pthread_create_detached(&tid[0], NULL,

PlayThread, (void *)th)) != 0) {

perror("pthread_create_detached");

}

/* Record/send and play thread */

th = (ThreadParam *)calloc(1, sizeof(ThreadParam));

th->in_sock4 = in_sock4;

th->out_sock4 = out_sock4;

th->in_sock6 = in_sock6;

th->out_sock6 = out_sock6;

th->au_dev = au_r;

th->o_send = o_send;

th->o_silence = o_silence;

th->threshold = threshold;

if ((e = pthread_create_detached(&tid[1], NULL,

RecordThread, (void *)th)) != 0) {

perror("pthread_create_detached");

}

return 0;

}

Reference

1) Columbia University CINEMA project.

2) Help from PHD students in IRT lab

3) Researched conducted through embedded-Linux newsgroup

4) Sharp Zaurus website