Commsdesign Home Register About Commsdesign Feedback Online Opportunities SpecSearch GlobalSpec


















Audio Designline



eLibrary

EE TIMES NETWORK
 Online Editions
 EE TIMES
 EE TIMES ASIA
 EE TIMES CHINA
 EE TIMES FRANCE
 EE TIMES GERMANY
 EE TIMES INDIA
 EE TIMES JAPAN
 EE TIMES KOREA
 EE TIMES TAIWAN
 EE TIMES UK

 EE TIMES EUROPE
 ANALOG EUROPE
 INDUSTRIAL EUROPE
 AUTOMOTIVE DL EUROPE

 POWER DL EUROPE

 Web Sites
 • Audio DesignLine
 • Automotive DesignLine
 • Career Center
 • CommsDesign
 • Microwave
    Engineering
 • Deepchip.com
 • Design & Reuse
 • Digital Home DesignLine
 • DSP DesignLine
 • EDA DesignLine
 • Embedded.com
 • Elektronik i Norden
 • Green SupplyLine
 • Industrial Control
    DesignLine
 • Planet Analog
 • Mobile Handset
    DesignLine
 • Power Management
    DesignLine
 • Programmable Logic
    DesignLine
 • RF DesignLine
 • RFID-World
 • Techonline
 • Video | Imaging
    DesignLine
 • Wireless Net
    DesignLine

ELECTRONICS GROUP SITES

 • eeProductCenter
 • Electronics Supply &
    Manufacturing
 • Conferences
    and Events
 • Electronics Supply &
    Manufacturing--China
 • Electronics Express
 • Webinars


28 August 2008



Handling VoIP Speech Coding Challenges: Part 1

For VoIP adoption to continue to grow, designers must improve speech quality in their system designs. This two-part series examines the speech coding headaches that you'll face when trying to achieve this feat.

By Elias Nemer, Intel
CommsDesign
Oct 16, 2002
Print This Story Send As Email Reprints
 
Rate this article
WORSE | BETTER
1 2 3 4 5
Despite critics claims, acceptance of voice-over-IP (VoIP) technology continues to gain ground in the enterprise and core network. With standards like PacketCable and 3GPP emerging, VoIP is also set to gain ground in the access markets.

But to ultimately succeed, designers must ensure that VoIP systems can deliver the quality of speech that end users demand. And to reach that plateau, designers must grapple with a host of impairments including as packet loss, delay, jitter, echo, noise and more.

This two-part series lays out the challenges designers must tackle to improve voice quality in a VoIP system design. The first part of the paper lays out the key attributes required in speech coder technology as well as headaches that the packet network provides speech-coding architectures. The second part of this series, which will appear online tomorrow, will examine how designers can account for speech coding problems through echo cancellation and more.

Coder Attributes
When designing a VoIP system, the choice of a speech coder is function of a number of network factors such as the expected delay and the available processing power, as well as the user requirement of service quality and expectation of speech quality. The attributes of a speech coder include bit rate, complexity, delay, and quality4. Let's look at these four dimensions in more detail.

1. Bit rate and required bandwidth: The bit rates of the coders defined by the ITU range from the low 2.4 kbit/s coders used in secure telephony to 64 kbit/s wideband coders, such as the G.722 or the G.711 pulse-code modulated (PCM) coder. The rate of the coder determines the required channel bandwidth. In cellular telephony, for instance, preserving bandwidth is crucial. As such, variable bit rate coders, such the enhanced variable rate coder (EVRC) used in 2G CDMA systems were designed to drop the coding rate during speech inactivity.

2. Delay: The delay of the coder is relevant to the extent that it adds to the overall end-to-end delay in a VoIP call. The total delay of a coder includes the framing, as well as the algorithmic or look-ahead delay. In G.728 for instance, frames are five samples long, whereas in cellular-telephony coders, frame sizes of 160 samples (typically 20 ms) are more common. High rate coders, such as G.711 and G.726 have a very low delay.

3. Quality: The quality of speech is a subjective measure that reflects on the way the signal is perceived by listeners. It can be expressed in terms of how much effort is required to understand the message or how pleasant or comfortable speech sounds to the human ear. Intelligibility on the other hand is an objective measure of the amount of information that can be extracted by listeners from the given signal.21. In military contexts, intelligibility is of critical importance, whereas in consumer telephony, quality takes precedence. The quality of speech coders is often measured though a mean opinion score (MOS) experiment. Quality degradation is also tested under bit error rate, frame erasure and background noise that may cause the coder to generate various unpleasant artifacts.

4. Complexity: Speech coding algorithms are in general computation intensive. As a result, they are typically implemented on programmable digital signal processors (DSPs) that are optimized for signal processing operations, such as convolutions, Fast Fourier transforms (FFTs), and digital filtering. PC-based processors have in recent years evolved to provide enough processing power to make them appropriate candidates to run complex operations such as speech coding. As the VLSI technology enables more MIPS per silicon area, at a decreasing cost, the complexity aspect is less crucial than it used to be. However, it is always desirable to pack as much functionality in a processor, and have efficient algorithms that do not use up a large percentage of the available processing power.

The commonly used coders such as G.723, G.729, and G.728 were developed with specific requirements and priorities in mind; as such, they provide different levels of compromises along these four dimensions. A summary of the performance level provided by each of these coders is highlighted in References 4, 5, and 6 as well as in Table 1.

Table 1: Summary of Attributes for 3 Commonly Used VoIP Coders

Attribute G.723.1 G.729 G.729a
Bit rate 6.4kbit/s
5.33kbit/s
8 kbit/s 8 kbit/s
Frame size 30 ms 10 ms 10 ms
Look ahead 7.5 ms 5 ms 5 ms
Total delay 67.5 ms 25 ms 25 ms
Complexity RAM 16 MIPS
2.2 kwords
20 MIPS
3 kwords
10 MIPS
2 kwords

Quality Issues in Packet Networks
To ensure high speech quality in an IP network, designers must realize that many of the challenges lie in the inherent nature of the network. There are really five big issues that designers will encounter in the network: packet loss/bit-error rates, delay, jitter, echo background noise, and tandeming effects. Let's look at each of these in more detail.

1. Packet Loss and Bit-Error Rates
In an end-to-end VoIP network, packets are lost due to either excessive bit errors, or congestion in the IP network, or simply excessive delay that cause the receiver to ignore the corresponding speech frames in the decoding operation. The first cause is the access network itself that includes a noisy channel, such as a wireless link or a cable or a DSL or a voice-band modem. In each channel, a certain amount of error detection and correction is designed in at the physical layer (PHY) to guarantee an upper limit on the bit error rate (BER). A packet is declared corrupted whenever it contains error bits that could not be corrected by the FEC mechanism.

The second cause of loss is due to the IP network itself, which is operated on a best effort basis. During peak traffic times, queues at intermediate routers may overflow and packets are simply dropped. Analyses of the loss statistics suggest that packet loss is highly bursty and the frequency distribution of the number of consecutive losses decreases geometrically.2, 18 For this reason, most recovery techniques are optimized for a maximum of 1- or 2-packet loss in a row.

Finally, packets are dropped (or ignored) at the receiver due to an excessive delay in arrival. In this case, it is better to ignore the packet and reconstruct the parameters than extend the delay in speech reconstruction. In general, voice traffic can tolerate some form of packet loss, depending on the coding algorithm, but a rate of greater than 5 percent is considered harmful to the voice quality and will result in a drop below toll quality for most coders.3

2. Delay
Long delays in speech communications cause echo and talker overlap problems. Echo is caused by the telephone hybrid circuit at the far end and causes the near-end talker to hear a reflected version of his voice. This reflection becomes annoying when the delay is greater than 50 ms. Talker overlap becomes significant if the one-way delay is greater than 250 ms, as the conversation becomes more of a push-to-talk rather than a normal conversation.

The source of delay in VoIP system is due to a number of factors:

  • Framing delay, defined as the time to collect and frame the samples. The value is function of the coders used (e.g. 10 ms for G729a; 30 ms for G.723).
  • Algorithmic delay, defined as the look-ahead delay required for some speech coders or some acoustic echo cancellers.
  • Processing delay, which is function of the user equipment, such as the processor speed and the efficiency of the coder implementation. It also includes other higher-layer functions such as the concatenation of several speech frames into a single packet to reduce overhead.
  • Network delay, which includes the various routing and buffering in the IP network, and scheduling and buffering at the receiver end to remove packet jitter.

To illustrate the impact of delay, consider the case of dial-up VoIP call originating from a user PC and utilizing a G.723 coder. The minimum values for the end-to-end delay components are given in Table 2 below:1

Table 2: Various Delay Components in a VoIP Call

Component Theoretical Delay (ms)
PC client 67.5
Access 44
IP network 40
Gateway 67.5
PSTN/phone Negligible
Total 159

3. Jitter
Jitter is the variance in the delay between consecutive packets. It is due to the delay difference on different routes throughout the IP network. Even if intermediate routing of traffic provides priority to voice traffic, there is no guarantee that consecutive packets arrive in order at the destination.

A typical remedy for jitter is to provide buffering at the destination to wait for late arriving packets and then re-sequence the speech frames for proper decoding. However, there is a limit on the amount of buffering that is practical.

A large jitter will result in more packets being dropped (i.e. lost) and this will impact quality. In some applications, the jitter buffer length is dynamically updated (Figure 1) to get an acceptable ratio of late arrivals over successfully processed frames.16 This however results in a changing average delay (due to buffering) and in turn requires that echo cancellation algorithms be capable of fast adaptation in their estimate of the round trip delay, as it changes dynamically during the course of a conversation.

Click here for Figure 1

Figure 1: Buffers used to smooth out inter-packet delay variance.

4. Echo and background noise
Echoes are the result of the 2- to 4-wire hybrid at the receiving user equipment. The longer the delay, the more noticeable and annoying this echo becomes in an interactive conversation. In addition, if the far end user is talking through a hands-free set or through a small-size handset (typical of cellular phones), then further echo will result due to the acoustic coupling in that set.

Both line echo cancellers as well as acoustic echo cancellers are needed to eliminate the echo so that the perceived quality is not impaired. The ITU-T recommendations G.165 and G.168 specify the characteristics of echo cancellers, in terms of required length of the delay to cancel as well as the targeted echo attenuation.

In the context of mobile telephony and conference call setting, surrounding acoustic noise often corrupts speech signals. This in turn has an adverse effect on the perceived quality and intelligibility of speech as well as on the performance of speech coders. These coders rely on a model for the clean signal and cannot properly handle background noise signals such as engine, wind, traffic, music or the aggregate effect of many interfering speakers. As result of the coding process, the effect of background noise is often amplified and results in unnatural and annoying sounds to the far end user (Figure 2) . The case is more severe for low rate coders and more so for CELP-based coders than for waveform coders such as PCM or adaptive differential PCM (ADPCM).

Click here for Figure 2

Figure 2: Acoustic noise yields artifacts in the decoded speech.

5. Tandeming Effects
As VoIP telephony becomes more widely deployed, it will encompass a variety of networks, and in turn a variety of speech coders that different in bit rates, parameter sets, frame sizes, and update rates; for instance a call initiated on a cable-based phone using G.729 and ending on a 2G CDMA cellular system using the EVRC coder (Figure 3). If the speech is decoded and recoded at the network boundaries, the coding artifacts are further amplified and could cause a significant degradation in quality. In addition, tandeming requires higher computation cost and also increases the overall delay due to packetizing and processing.

Click here for Figure 3

Figure 3: Tandeming at network boundaries.

On To Part 2
That wraps up Part 1 of our series on VoIP speech coding. This part laid out the attributes of the coder as well as issues the IP network provides. In part 2, which will appear online tomorrow, we'll extend this discussion by looking at the specific challenges that speech coders deliver to VoIP architectures.

Editor's Note: This paper is based on a presentation made at the 2002 Communications Design Conference.

References

  1. A. Watson and M. Sasse. "Measuring Perceived Quality of Speech and Video in Multimedia Conferencing Applications", Proceedings of ACM Multimedia, pp. 55 -- 60. Sept. 1998.
  2. J. Bolot and A. Vega-Garcia. "Control Mechanism for Packet Audio in the Internet". IEEE INFOCOM '96. Volume: 1, 1996 pp: 232 --239.
  3. C. Padhye and K. Christensen. "A New Adaptive FEC Loss Control Algorithm for Voice Over IP Applications". IEEE Computing, and Communications Conference, 2000. IPCCC '00. Page(s): 307 --313.
  4. R. Cox. "Three New Speech Coders from The ITU Cover a Range of Applications". IEEE Communications Magazine. Sept 1997, pp 40 -- 47.
  5. G. Schroder and M. Hashem. "The Road to G.729: ITU 8 kbps Speech Coding Algorithm with Wireline Quality". IEEE Communications Magazine. Sept 1997, pp 48 -- 54.
  6. R. Salami, C. Laflamme, B. Bessette, JP Adoul. "ITU-T G.729 Annex A: Reduced Complexity 8 kbit/s CS-ACELP Codec for Digital Simultaneous Voice and Data". IEEE Communications Magazine. Sept 1997. pp 56 -- 63.
  7. J. DeMartin, T. Unno and V. Viswanathan. "Improved Frame Erasure Concealement for CELP-Based Coders". IEEE ICASSP '00. Volume: 3 pp 1483 --1486. 2000.
  8. F. Poppe, D. DeVleeschauwer and G. Petit. "Guaranteeing QoS to Packetized Voice over the UMTS Air Interface". IEEE IWQOS. 2000. pp 85 --91.
  9. J.F. Wang, J.C. Wang, J.F. Yang, and JJ. Wang. "A Voicing-driven Packet Loss Recovery Algorithm for Analysis-by-Synthesis Predictive Speech Coders over Internet". IEEE Transactions on Multimedia. Vol. 3, No. 1, March 2001. pp 98 -- 107.
  10. B. Goodman. "Internet Telephony and Modem Delay". IEEE Network. May/June 1999. pp 8 -- 16.
  11. R. Martin, C. Hoelper, I. Wittke. "Estimation of Missing LSF Parameters Using Guaussian Mixture Models". IEEE Acoustics, Speech, and Signal Processing, 2001. Volume: 2. pp 729 -732
  12. N. Shacham and P. McKenney. "Packet Recovery in High-Speed Networks Using Coding and Buffer Management". IEEE INFOCOM '90, pp: 124 -131 vol.1.
  13. D. Rahika, J. Collura, T. Fuja, D. Sridhara, T. Fazel. "Error Coding Strategies for MELP Vocoder in Wireless and ATM environments". Speech Coding for Algorithms for Radio Channels (Ref. No. 2000/012), IEE Seminar, 2000. Page(s): 8/1 -833
  14. C. Erdmann et al. "A Candidate Proposal for a 3GPP Adaptive Multi-rate Wideband Speech Codec". IEEE ICASSP Volume: 2, 2001. Page(s): 757 -760 vol.2
  15. K. Kim. "An Efficient Transcoding Algorithm for G.723.1 and EVRC Speech Coders". IEEE Vehicular Technology Conference, 2001. VTC 2001 Fall. pp: 1561 -1564 vol.3 pp 1561 -- 1564.
  16. E. Morgan. "Voice over Cable". White paper. www.telogy.com.
  17. H. Kang, H. Kim, R. Cox. "Improving Transcoding Capability of Speech Coders in Clean and Frame Erasures Channel Environments". IEEE Workshop on Speech Coding, 2000. pp: 78 --80.
  18. M. Borella and D. Swider. "Internet Packet Loss: Measurement and Implications for End-to-End QoS". Architectural and OS Support for Multimedia Applications/Flexible Communication Systems/Wireless Networks and Mobile Computing, 1998. Page(s): 3 --12.
  19. C. Perkins, O. Hodson, V. Hardman. "A Survey of Packet Loss Recovery Techniques for Streaming Audio". IEEE Network. Sept/Oct 1998 pp 40 -- 48.
  20. E. Nemer. "Acoustic Noise Reduction for Mobile Telephony". DSP World Spring Design Conference. April 2000.
  21. D. O'Shaughnessy. "Enhancing Speech Degraded by Additive Noise or Interfering Speakers". IEEE Comm. Magazine, Feb 1989, pp 46-52.

About the Author
Elias Nemer is a senior member of technical staff at Intel Corp. He holds a B.Eng(EE), M.Eng(EE), and MBA from McGill University (Montreal, Canada) as well as a Ph.D. (EE) from Carleton University (Ottawa, Canada). Elias can be reached at enemer@ieee.org.




EE Times TechCareers
Search Jobs

Enter Keyword(s):


Function:


State:
  

Post Your Resume
-----------------
Employers Area
Most Recent Posts More career-related news, resources and job postings for technology professionals



Home  |  Register  |  About  |  Feedback  |  Contact   |  Site Map