CODECS FEATURES Transporting real-time voice over packet-switched networks such as the internet the used codecs require the following features. Sampling, compressing and packetization of the voice signal (Speech Coding). The dominant standard for transmitting multimedia in packed-switched networks is International Telecommunication Union (ITU) Recommendation H.323, which uses IP/UDP/RTP (LINK) encapsulation for audio. Furthermore the codec should also be able to transmit signaling information (DTMF Tones), otherwise an additional transport protocol like the H.225.0 is required. |
BIT RATE The bit rate is an very important parameter in the speech coder design. Caused by the growing need for bandwith conservation the coders shoud produce high quality even at lower bit rates. Speech coded at 64 kilobit per second (kbit/s) using logarithmic pulse code modulation (PCM) is considered as “non-compressed” and is often used as a reference for quality comparision. The range of bit rates that have been stardardized is from 2.4 kbit/s for secure telephony to 64 kbit/s G.711 PCM and G.722 (wideband 7kHz) speech coder.
TABLE 1![]() Ref. [1],Figure 7 ITU, Cellular and secure telephony speech coding standards. The overall bit rate to be transmitted is the sum of the coder dependent transmission bits (pay loads) and the header bit caused by the protocol. Real-time protocol (RTP) headers provide the sequence number and timestamp information needed to reassemble a real-time stream from packets. The Voice Over IP (VoIP) standards committee is proposing a subset of H.323 for audio over IP. The H.323 standard addresses Video (Audiovisual) communication on Local Area Networks. The Ratio of header overhead to payload decreases with increasing packet size. Therefor the possible data compression gain depends on this ratio, too.
The bit rate is a very important parameter for local users connected to the internet via modem. Modems offer a defined bounded bit rate, in general 56kbit/s.
fixed vs variable bit rate
|
DELAY Delays have a great impact on ist suitability for a particular application. We have to distinguish between real-time conversation and multimedia storage applications. We will consider the first case in the following in detail, because it is the most delay sensitive application. Wherebye the second case is the least senstive one. Delays causes two problems defined in literature as echos and talker overlap. Echos are caused by signal reflections at the far-end during four wire to two wire hybrid conversion and acoustic feedback. Echos become significant if round trip delay are greater than 50ms. Implemented echo cancellation algorithms trie to reduce the enyoing reflections to a certain level. If the time gaps between the direct signal and the reflected ones are much greater than the mentioned threshold above, even smaller levels are still enyoing. Talker overlaps arise if the one-way delay becomes greater than 250 ms. The conversation will become more like a half-duplex or push-to-talk experience, rather than an ordinary conversation. In the following different sources contributing to the overall one-way delay are considered and depicted in fig.2 reliable average values are given for different codecs in table 2. TABLE 2![]() Ref. [1],Table 1
Accumulation Delay
Processing Delay
Network Delay
Buffer Delay (Jitter)
|
COMPLEXITY The measures of complexity for a DSP and a CPU are somewhat different, due to the natures of these two systems. Caused by the different DSP architectures consequently different efficient implementations of coders are achivied. Computational complexity is the number of instructions per second required for implementations and is usually expressed in millions instructions per second (MIPS). A second measure is the required amount of memory (RAM). Required ROM storage (for storing programm instructions and constants) is the third measure of complexity. |
PERFORMANCE & QUALITY Under this criteria come the synonymous with intelligibility. Furthermore the signal-to-noise ratio (SNR) is a kind of quantity to measure the quality. The coder should be capable to transmit music or a combination of speech plus some other signals. Otherwise the problem known as robustness to background noise may occure. Therefor the performance of the speech coder degrades significantly. It is very difficult to verify compliance for a bitstream specification. Therefore most often it is done via subjective testing, comparing implementations under test with a known version of the coder. The ITU includes the Speech Quality Experts Group measuring speech quality and determining whether performance should be sufficient for a given application.
PACKET LOSS
|