Why beyond VoIP protocols?

1.2
The new multimedia services will need to be much more than mere technology demonstrators. In order to build a demonstrator, engineers need only focus on the functional aspects: select a protocol, make sure it has the right service primitives, and combine these primitives into the desired functionality. The companion reference to this topic, IP Telephony: Deploying Voice-over-IP Protocols, focuses on such functional aspects, presenting a high-level overview of packet media transport technologies, details on all three major VoIP protocols (H.323, SIP, and MGCP), and specific strategies to design services in the context of public networks where endpoints cannot be trusted and can be behind firewalls.
As its title implies, Beyond VoIP Protocols: Understanding Voice Technology and Networking Techniques for IP Telephony provides a broad overview of all the additional issues that need to be solved in order to deploy a multimedia service.
1.2.1

Selecting a voice coder

In the lab, almost any voice coder can be selected: there is plenty of bandwidth and hardly any packet loss. In a real network, however, even with the massive deployment of DSL, there is often a need to carefully select a voice coder that fits in the available bandwidth and provides the desired level of service. This is not an obvious choice, and it is necessary to have a deeper understanding on the internals of each voice coder in order to understand how each candidate coder may react to packet loss, for instance. During the bubble, the VoIP industry has generated many ‘magic coders’ which are supposed to outperform any other codec: a deeper understanding of codec technology helps separating true innovations from naive tricks. Finally, as the new generation of multi-rate adaptive coders appears for use by the 3G networks, it important to keep in mind the fundamental differences between wireless and wired networks in order to evaluate which of the many innovations of AMR or AMR-WB coders may lead to significant improvements for Internet-based multimedia applications.
topic 2 “Introduction to Speech Coding Techniques” provides the necessary background to efficiently evaluate the candidate coders for a network and make the best compromise.
1.2.2

Providing ‘toll quality’ … and more

The first service providers who massively adopted VoIP were prepaid card vendors. Unfortunately, many of these service providers bet on the fact that most of their potential clients
would focus only on price and would have no means of complaining or asking for a refund if the voice quality was not acceptable. VoIP also had a lot of success among international transit carriers and arbitrage houses, and here as well voice quality is often not a prime concern. If you travel abroad and try to reach your voicemail, but cannot dial your DTMF access code correctly, chances are that your current service provider uses a VoIP network for international calls and never checked whether DTMF tones could get through.
Such bad experiences unfortunately backfired and created a perception among first-tier service providers that VoIP did not work. Most first-tier service providers conducted experiments between 2000 and 2002 in order to assess the elasticity of user voice quality acceptance levels relative to price. These studies aimed at designing tiered voice offers, with cheap, low-quality calls and more expensive toll-quality calls. To the surprise of everyone, these studies showed that there was only a willingness to pay more than toll quality for very high-quality calls (wideband coders); on the other hand, if the toll quality was perceived to be significantly lower than toll quality, there was no willingness to pay at all.
The consequence is that all post-bubble VoIP networks will need to provide a voice quality guaranteed to be comparable with toll quality, or better. Beyond the intrinsic quality of the voice coder, detailed in topic 2, topic 3 ‘Voice Quality’ discusses in detail how to control the most important parameters influencing end-users’ perception of voice quality: delay and echo.
1.2.3

Controlling IP quality of service

Peer-to-peer applications killed the idea that over-provisioning could solve quality-of-service problems on the Internet. Now that almost every DSL user is attempting to download his wish list of 10 or more 700-MB ‘DivX’ videos, it can be taken for granted, on the contrary, that most DSL links are permanently congested. The situation is likely to become even worse as some peer-to-peer telephony applications begin to use very aggressive redundancy techniques in order to get an unfair share of the best effort bandwidth. If everyone keeps throwing more packets to the best effort Internet, it will soon become very difficult to use this class of service for many applications that require a minimum level of responsiveness. In fact, the ‘best effort’ class of service is no longer usable for real-time applications, like telephony or videoconferencing, which cannot recover from packet loss.
topic 4 ‘Quality of Service’ discusses these issues from multiple points of view. At a low level, it explains the PGPS theory that makes it possible to provide differentiated levels of quality of service over all packet networks and helps understand the old ‘IP against ATM’ battles. It then presents the ‘DiffServ’ framework that today provides a simple, yet effective, way of marking IP packets with a desired quality-of-service level, and eventually downgrades to lower quality-of-service levels packets that are outside the agreed service-level agreement. There is a lot that can be done with DiffServ, but it must be used carefully, and the topic also gives some guidelines on which types of traffic can be aggregated within a given service level and how to improve ‘fairness’ in a given class of service (even the best effort class).
Nevertheless, Diffserv does have some limitations, and it is likely that in the long run service providers will need to implement more dynamic ways of managing the service level of packet streams generated by end-users. The IntServ framework was initially presented as a direct application of the PGPS theory, and as such it is very powerful but also difficult to scale. topic 4 also discusses how a mix of IntServ and Diffserv could provide a good compromise for the future, and describes the current DQoS framework for cable networks, which is probably very similar to the techniques that will be used in the future on all public IP networks.
1.2.4

Dimensioning the network

From a dimensioning point of view, packet multimedia networks based on IP are unique compared with traditional telephony networks based on time division multiplexing (TDM), but also compared with other packet-based networks (e.g., ATM). The difference with TDM is obvious: bandwidth is no longer needed during silence periods, which makes it possible, when aggregating multiple streams, to save up to 50% of the bandwidth that would have been necessary if all voice channels were transmitting continuously. Unfortunately, this gain must be mitigated by the fact that the technique used by virtually all IP applications to transport media streams, RTP, is very inefficient in itself. In most cases, discontinuous transmission gains will be compensated by the overheads of the IP transport, and in the end the average capacity required by an IP transport with simple voice coders is comparable with what would have been required on TDM. It is possible to achieve further gains by using low-bitrate coders, but this has an influence on end-to-end delay and voice quality, or in specific circumstances by optimizing the IP/UDP/RTP transport layers.
The difference from ATM networks is less obvious. As there was no ‘best effort’ traffic on ATM, networks required a very strict dimensioning in order to minimize the chances of rejecting a new connection if the admission control failed. As we have seen above, the explosive growth of best effort traffic makes it impossible to use this class for interactive streams; but once a separate class of service is created for voice or videoconferencing, the fact that most of the network capacity is used for best effort traffic makes it a lot easier to dimension the real-time class of service. Within reasonable limits, the real-time class can ‘eat’ the capacity used by best effort users who have no service-level agreement and, in theory, never complain. In fact, hybrid voice and data networks are not only simpler to dimension, but will provide lower end-to-end transmission delays for voice streams, compared with a pure voice network. topic 5 ‘Network Dimensioning’ presents the traditional techniques that must be used to dimension a network (e.g., the Erlang laws that evaluate the number of simultaneous active calls for a given number of users, or the Poisson laws that can be used to evaluate the processing capacity of softswitches required to handle VoIP signaling). topic 5 also discusses the characteristics of VoIP streams compared with TDM voice channels and explains how to extrapolate the results of traditional network dimensioning theory, based on constant bitrate streams, to ‘on-off’ streams typical of VoIP when used in combination with voice activity detection.
1.2.5

Unleashing the potential of multicast

The IP network was initially designed and optimized to provide robust point-to-point connections, using ‘unicast’ packets. Unfortunately, not all applications work well with point-to-point connections: all broadcast applications do not scale well if they need to duplicate the information stream for every listener. Today most commercial IP networks are still ‘unicast’-only, and for this reason when you connect to TV station websites, you get only a small, low-quality image even if you have a DSL connection at home. The reason is that the TV station still uses unicast and therefore needs to send a copy of the TV channel to everyone. Today MPEG2 requires about 2-3 Mbit/s for TV-quality transmission over IP: sending a single channel to 1 million ‘IP-TV’ sets would require no less than 3 Tbit/s at the TV station!
Today many service providers are enhancing their IP networks to provide support for ‘multicast’. This supports their ‘triple-play’ strategy in the residential market, which requires high-quality TV transmission over IP and enables many services in the corporate market (e.g. large videoconferences—the equivalent of webinars with a video stream).
topic 6 ‘Multicast’ explains how the technology can turn an IP backbone into a optimized broadcast medium, and discusses the numerous issues that have delayed the introduction of multicast on commercial networks and continue to limit the scope of feasible applications today. As multicast does have a specific behavior related to network sizing, topi 5 also presents the impact of various multicast distribution tree configurations on the required capacity of each link.
For the engineering department of service providers, topics 5 and 6 together will provide much of the material required when designing a ‘triple-play’ offer combining voice, IP-TV, and Internet access over DSL.