e-OTI: Random Thoughts on Delivering End-to-End Quality of Service on the Internet

For more information:

Internet Engineering Task Force (IETF)

Integrated Services (intserv)

Integrated Services over Specific Link Layers (issll)

Resource Reservation Setup Protocol (rsvp)

ISOC Logo

Random Thoughts on Delivering End-to-End Quality of Service on the Internet
By Paul Ferguson
pferguson@cisco.com

Let's start this discussion off with a couple of basic questions: What is quality of service (QoS)? And why is there so much importance attached to it?

After participating in many spirited discussions on the topic, I've subjectively determined that the most practical implementation of a QoS implementation that is currently feasible-no need to devise new protocols-is any method of delivering selected portions of traffic (read: packets) at a distinguished level of precedence or priority over other types of so-called regular traffic. There are certainly other, perhaps more far-reaching and longer-term, definitions of QoS that include more dynamic methods of measuring, determining, and controlling the topological parameters related to QoS such as future QoS routing methodologies. However, I have reserved speculation on them here, because nothing in this category currently exists. That is not to imply that something more elaborate is unattainable-just that it's not quite on the radar screen.

QoS historically has been a vague term used to describe some way of magically differentiating service levels; of getting better performance by turning a magical knob; and depending on how it might be implemented, can be some type of layer 2 (asynchronous transfer mode [ATM] CBR [commited burst rate], frame-relay CIR [commited information rate) or layer 3 mechanism. Personally, I champion delivery of QoS at layer 3 for reasons I outline here.

There are basically two reasons why providing QoS is important. One is that in times of network congestion, it would be ideal to have a mechanism that provides preferential treatment for certain types of traffic. If there is never congestion, there basically is no issue; everyone's traffic gets delivered in a timely manner, and everyone is happy. The second reason is also fairly straightforward: in a world where Internet service providers are looking for a way to differentiate themselves from one another for economic reasons, having a good QoS implementation can be a leg up on one's competition.

What are the issues in delivering quality of service? The answer is certainly complex and is probably better understood if we discuss the shortcomings in attempts to deliver QoS at layer 2.

Let's begin with an example of attempts to prioritize traffic in a frame-relay environment. It is true that any type of prioritization or fancy router queuing mechanisms are tricky in the frame relay world. Prioritization or fancy queuing over frame-relay doesn't always work as well as one would expect, because the router is not the final point of queuing. The same is true in an ATM environment.

For instance, on a point-to-point circuit, let's say the router receives a burst of five frames of normal-priority traffic. This traffic sits in a normal-priority queue of the router. If the router then receives a single high-priority frame, it gets put in the high-priority queue, and it is the next frame sent. On the other hand, with frame relay, if the port speed is faster than the permanent virtual circuit (PVC) speed, you might receive five normal-priority frames and send them out to the frame-relay switch, where they sit in the switch's queue(s). If you then receive a high-priority frame, it gets forwarded to the switch immediately-because it's in the high-priority queue on the router-but it ends up getting queued up behind the normal traffic in the frame relay switch's queue because the switch has no way of distinguishing between traffic the router considered high priority or normal priority.

This is not an issue if the port speed is not faster than the PVC speed and is completely a nonissue in the private-line world. However, this example illustrates the disconnect in attempts to use layer 2 mechanisms to deliver QoS in a layer 3 world.

By the same token, attempts to provide QoS with frame-relay CIR or ATM CBR share similarly flawed aspects. Although both CIR and CBR can guarantee delivery of frames or cells up to a specific threshold, there is no way for layer 3 protocols to be cognizant of those parameters, and they in no way provide explicit layer 3 support. What it means is that once again the responsibilities for delivering packets rely on complex buffering and queue management at layer 3, which can provide wildly varying results, especially at very high speeds.

The same holds true today if one is trying to deliver QoS with ATM: unless a PVC can be built directly between communicating end stations, then it stands to reason that one must rely on transmission control protocol (TCP) congestion avoidance behavior-for example, TCP slow start-to dynamically detect instances of loss and congestion. This is not, however, a consistent or optimal expectation, because the chances of building an ATM PVC end to end on the Internet are not good. In fact, there are likely to be several different types of transport media such as ethernet, FDDI, serial lines, ATM, and frame-relay switches in the transit path of a packet in flight on the Internet.

The foregoing are common examples of the problems encountered today in attempts to deliver QoS while relying on any one particular layer 2 mechanism. It should also be reiterated that signaling mechanisms that are native to traditional layer 2 services cannot be adequately communicated to the layer 3 protocols, which tends to complicate matters further. One might argue that in the face of loss, TCP reacts in the desired manner: by backing off. However, that approach does not provide a mechanism for any granularity for differentiation and does not apply to UDP traffic at all.

Of course, it also stands to reason that if the source and destination of a particular traffic flow are directly connected to the same administrative domain, it is much easier to implement a layer 2 QoS strategy. However, if the source and destination are separated by several administrative domains, as is usually the case on the Internet, it is virtually impossible to rely on layer 2 mechanisms to provide the desired QoS service differentiation.

One might suggest that trying to deliver quality of service on the Internet is a Catch-22: while we are working in a connectionless, datagram-oriented environment-the TCP/IP suite-there are no functional guarantees in delivery. But TCP was designed to be somewhat graceful in the face of packet loss. So what we have, basically, is an ability to provide varying degrees of best-effort delivery. In effect, there simply is no way to guarantee traffic delivery.

Thus, delivering QoS becomes an exercise of ensuring that regular traffic has a higher probability of being dropped than does premium traffic. This is a highly controversial topic, an issue of much debate, and a great source of indirection in the Internet community. In fact, while several larger Internet service providers have deployed ATM as a switching fabric within their networks to achieve link speeds of OC-3 and higher, a significant number have not implemented, and do not plan to use, ATM-specific features to deliver QoS because of those issues. It will be interesting to see how alternative technologies such as PPP-over-SONET and Gigabit Ethernet will affect the changing landscape of developing and delivering QoS.

In any event, integrating all of it has been the main thrust of a couple of working groups in the Internet Engineering Task Force (IETF):

Integrated Services (intserv)

Integrated Services over Specific Link Layers (issll)

Resource Reservation Setup Protocol (rsvp)

The Integrated Services Working Group of the IETF has the challenging task of trying to fit all the pieces together to make it all work. I bid them good luck. It is not clear, at least at this point, that QoS will be a specific data point in the INTSERV WG framework.

How does RSVP fit into the QoS equation? It should be mentioned here that RSVP (resource reservation setup protocol) does not specifically provide QoS. In fact, RSVP and QoS are technically orthogonal, although RSVP relies on the availability of an underlying QoS architecture. In fact, you can think of it this way:

+-------+-----------+-------+

| RSVP |

+-------+-----------+-------+

| QoS definitions |

+---------------------------+

The QoS definition layer is really only conceptual; it may be either a layer 2 or layer 3 implementation, but one might suggest that an entirely layer 3 QoS mechanism would be much simpler, more elegant, and more predictable.

Once a QoS architecture has been implemented, specific RSVP sessions can request to use specific portions of resources. The resource that RSVP sessions may request is generally considered to be dedicated bandwidth; however, it can be delivered as treating different types of traffic as separate classes of service, each which is treated differently on a hop-by-hop basis throughout the end-to-end transit path.

One mechanism that has been suggested provides an interesting possibility: the ability to use an Internet protocol (IP) precedence-based queuing service-which would cause a router to order process and output interface queues based on highest IP precedence-in conjunction with precedence-based congestion control.1 The IP type of service field was intended by the designers of IP to contain information that would inform the network as to the type of service requested, but it has been only rarely used. This field contains two pieces: a precedence value and a type of service subfield. This type of approach suggests that consistent delivery of traffic based on different levels of best-effort delivery is possible. It should again be noted that differentiation is applicable only in times of congestion; otherwise, everyone's bits get delivered.

One also might suggest that achieving a consensus on how to handle layer 3 options, or IP precedence, across multiple administrative domains may be an easier task than provisioning layer 2 QoS across multiple administrative, and perhaps technically incompatible, layer 2 networks. Recall that the common bearer service in the Internet is IP, and that many different types of layer 2 service may lie in the transit path of any given packet in flight.

What is the role of QoS Routing? Some of the more intriguing discussions have concerned the lack of QoS intelligence in available routing protocols. Such intelligence could possibly make implemention of QoS much more robust. If you think about it, there is currently simply no way for an existing routing protocol to determine the best path based on QoS parameters; it falls into the we-don't-know-how-to-do-it-yet category. There has been recent interest in this area in the IETF, and a Working Group is in the process of being formed to explore the possibilities. A good discussion on the topic is contained in Quality of Service (QoS)–Based Routing in the Internet-Some Issues.

Summary

Quality of service continues to be a hotly debated topic in the networking community primarily because of deep-seated religious preferences and technical bias. Take a moment to examine the fact that on the Internet we are collectively dealing with a single common bearer service: the IP. If one can realize that any type of guarantee is unachievable, then differentiated classes of service by delivering varying levels of best effort constitute arguably the best method of achieving QoS on the Internet. In addition, considering the issue of using pockets of layer 2 infrastructure in an unreliable attempt to deliver QoS, using existing mechanisms in IP represents a simple approach to an otherwise confounding problem.

It stands to reason then that simplicity lends itself to elegance.

Reference

1 Work in Progress-Path Precedence Discovery; Geoff Huston and Marshall T. Rose, December 1996; draft-huston-pprec-discov-00.txt.

Paul Ferguson is a consulting engineer for Cisco Systems based in the Washington, D.C., area. His principal disciplines are large-scale routing design, global routing issues, and Internet service providers. He is an active participant in the Internet Engineering Task Force as well as the North American Network Operators Group.