Project in computer science
Fang Cheng
094-88-8445
Supervisor: Wenyu Jiang & Henning Schulzrinne
Dec 19. 2001
In this course, I am involved with a research project on perceived quality of VoIP. The first part of my assignment was to help with the MOS test. The second part involved writing some programs to simulate the receiver side using several algorithms. The following report states each of the part in detail.
During the summer time, I organized two batches of MOS testings. In the first MOS testing, We had 4 original audio clips, each about 7 seconds long. Based on that, there are in total 78 regenerated audio clips using G.729 given different conditional and unconditional loss combination, packet interval, and whether FEC is used or not. We have in total 20 people in the test. Each listener is told to listen to the total 82 audio clips, and gave a mark range from 1 to 5 based on the sound quality in the four aspects: crispiness, smoothness, degree of distortion, and degree of noise.
The marking schema is following the MOS requirement except for accuracy and statistic reasons, we asked listeners to give one extra decimal point in the mark instead of just using integers. In addition, we tried to strengthen several rules to achieve a unified and independent testing environment. For example, we required individuals to listen to clips separately so that there is no interference and no hint from one person to another. Even though we might have some expectation for some particular audio clips, we also make sure that the sequence of the audio clips is not in an order that a listener could guess the mark of the next audio clips.
After collecting the data, the average MOS score for each audio clip is calculated, which was further used for analytical studies of correlation among loss, packet interval, and FEC conditions. Please refer the details to [2].
Similar to the first test, the second test is carried in exactly the same way with 10 people and less audio clips.
In this simulation program, a sound file is read and encoded into frames using an external G.729 encoder. Silence packets should not be transmitted and therefore are taken out. Based on a trace file that records network loss and delay, a network traffic situation is mimiced. Then frames are packed into packet, and depends on algorithms we are applying, the program determines whether a particular packet is a network loss, a delay loss, recoverable before its playout time, or arrived on time. Based on this determination, this packet is either dropped or decoded and ready to play. There will be three output files for each run of simulation. The sound file generated from the orignal, a sound file that shows the all the application lost packets, and a log file records all statistic information.
The purpose of this program is to generate sound files using several pre-defined algorithms for further study. Statistic data collected includes playout delay for each talk spurt, average playout delay, network and delay loss, packets recovery. Notice that this report does not address each algorithm itself in detail. Please refer to references enclosed for details.
The following algorithms have been implemented. Mean delays are calculated differently according to each algorithm. Playout delay is then calculated based on mean delay and some variant. This playout delay is then used as the limit to determine whether packets in the next talk spurt will be late or not. This playout delay is also used to adjust the length of silence in between two talk spurts.
Network delays usually exhibit burst characteristics. Spike delay algorithm contains a spike detection algorithm to find these spikes. In my program, if the next network delay is larger than the previous network delay, and the difference exceeds some threshold, it will trigger the program into the spike mode. Within the spike mode, the mean delay calculation differs to that in the normal mode. And if the network delay satisfies the end of spike detection algorithm, the spike mode switches back to the normal mode. The implementation algorithm is demonstrated as the following:
if (spikeMode == NORMAL) { if (fabs(networkDelay - prevNetworkDelay) > fabs(variant) * 2 + TO_SPIKE_THRESHOLD) { var = 0; spikeMode = IMPULSE; } } else { var = var/2 + fabs(2 * networkDelay - prevNetworkDelay - prevPrevNetworkDelay)/8; if (var <= TO_NORMAL_THRESHOLD) { spikeMode = NORMAL; } } if (spikeMode == NORMAL) { meanDelay = 0.125 * networkDelay + 0.875 * meanDelay; } else { meanDelay = meanDelay + networkDelay - prevNetworkDelay; }
|
This algorithm is very commonly used to estimates the mean delay through a sort of weight calculated based on the previous mean delay and the new network delay as defined as the following
meanDelay = alpha * meanDelay + (1 - alpha) * networkDelay (alpha is a constant) |
In our simplified version of Previous Optimal Algorithm, we set up the step size to be 1 instead of something greater than 1. In this case, the only quantifier will be the maximum Delay, and the calculation formula would be like the following
maxDelay = (maxDelay > networkDelay) ? maxDelay : networkDelay; meanDelay = 0.25 * meanDelay + 0.75 * maxDelay; |
In this algorithm, for a particular talk spurt, the maximum network delay is searched for all the arriving packets and used as the playout delay for all the packets in the same talk spurt. Then the silence between each talk spurt is adjusted. This algorithm is indeed not quite practical, because it requires buffering all frames within a talk spurt and introducing relatively quite large playout delay. We implemented it just for comparison purposes.
These algorithms can be applied to four situations:
FEC sends duplicate packages which can be used to recover the data loss. The duplicate packages are overlayed with the master copy of data packets. In this program, we only implement one FEC case, namely Reed-Solomon(3,2). 3 is the number of packets within one FEC data units, and 2 is the number of non-FEC packets. In addition to determine whether a packet is network or delay loss, the program also determines whether a lost packet could be re-covered.
Mid-spurt adjusts the playout delay in the middle of a spurt if it sees a suddenly increase in delay. In the simulation program, when there are two consecutive packets are delay loss, the playout delay will be adjusted even in the middle of a talk spurt. Notice that, this situation does not apply to the max delay algorithm.
If no FEC and no Mid-Spurt situation is applied.
This situation happens when both FEC and Mid-Spurt applies.
REFERENCES
[1] Integrating Packet FEC into Adaptive Voice Playout Buffer Algorithms on the Internet. Jonathan Rosenberg, Lili Qiu, Henning Schulzrinne.
[2] Effect of Bursty Loss and Delay on Perceived Quality of VoIP, Wenyu Jiang, Henning Schulzrinne, Review version.