Discussion and promising development
As we stressed throughout the presentation the speech coding for the Internet is in the transitional period right now. There are working methods for transmitting real time speech over a packet-oriented network. On the other hand these methods were directly adopted from the circuit switched networks. To authors knowledge there are no methods developed especially for the Internet environment.
The Internet poses some problems then it comes to the transmission speech (delay, errors etc.). However it allows us drop some fundamental restrictions which were considered in the development of the speech coders for the circuit switched networks. In this section we will try to highlight the advantages which gives the Internet.
Firstly, the fixed bitrate. It does make more sense to speak about the average bitrate in the Internet environment. The absence of the fixed bitrate restriction allows us in principle to develop a speech coding methods which distribute the bit budget in an adaptive fashion. Experiments and our knowledge about the speech perception and source coding clearly show that the amount of bits one has to spend on coding of a speech segment depends on the speech segment itself. Therefore we are talking about the constant quality speech coding as the opposite to the constant bitrate. The quality (average) versus rate for the pre-Internet era and rate (average) versus quality for the Internet era are shown in Figure 1.
Figure 1. Towards constant quality
speech coding in the Internet era.
Considering our still limited knowledge of the speech perception, development of the constant quality speech coder might take some time though. Nonetheless, the Internet gives us such a possibility and we believe this is the way to go. Naturally, one does not care about the bitrate one just wants to use the phone (IP phone) for a conversation. However for the currently used coding philosophies the quality of the coding sometimes drops dramatically, which is very annoying for the user.
The most promising schemes from the constant quality point of view right now are the pitch synchronous. Several schemes have been proposed with sinusoidal and waveform interpolation having the most success.
The pitch synchronous analysis of the speech signal has two main advantages over time synchronous (most modern coding standards). Firstly, analysis approximates the analysis made by the human ear, which gives perceptually most relevant set of the parameters as the result of the speech signal decomposition. This allows us to perform very efficient quantization. Secondly, these types of coders naturally exhibit extremely high robustness against the channel errors. In few words this is the result of the pitch synchronous analysis and reconstruction, which means that in the case of the packet loss we loose an integer number of periods. The lost periods can be reconstructed very accurately from the perceptual point of view by means of simple interpolation between the received ones. Moreover, the same interpolation principle allows us to maintain the smooth pitch evolution, which proved to be very important.
The personal Internet telephony becomes more and more popular these
days. This means that considering the speed of the modern CPUs of the personal
computers we can essentially drop the complexity restriction. Therefore
we can expect commercial implementations of the coders which were considered
too complex for an implementation on a single DSP chip, wherefore were
not standardized by ITU. Firstly these would be some enhancements to the
already standardized coders. Secondly we expect coders based on sinusoidal
and waveform interpolation models (and others) come to the market.