Check this one out lusanders:|
The Lone Router -- In May 1999, Data Comm asked 11 vendors to submit high-performance routers for testing. Six weeks later, only one showed up
Robert Mandeville and David Newman
Lies, damned lies, and the things router vendors say about their products: It's easy to understand why carriers are incensed. The makers of the boxes that will form the backbone of tomorrow's Internet say their devices will run at vastly higher speeds-and deliver more capacity than even the largest such routers in use today. Best of all, they're just about to be rolled out!
All of which they've been saying for months.
Data Comm and European Network Laboratories (ENL, Paris) decided to set out on the router trail in search of real data. We invited 11 vendors to take part in the largest-scale test of Internet backbone devices ever conducted. To size the test bed we took the dimensions of the world's largest ISP backbones as a starting point-and multiplied those numbers by a factor of five. And with the help of Netcom Systems Inc. (Calabasas, Calif.) and QOSnetics Inc. (Portsmouth, N.H.), we offered five to six times more routes than there are in the entire 'Net and much heavier traffic loads than ever before. What's more, we conducted the first public trials of POS (packet over Sonet), the technology likely to replace ATM on most 'Net backbones.
And in the end, the number of participants said almost as much as the results: Juniper Networks Inc. (Mountain View, Calif.) was the only vendor to accept our invitation, supplying four of its M40 routers. That didn't keep us from proceeding as planned: We pounded four Juniper boxes for a week, and in the process got some sense of how tomorrow's Internet backbones might behave. In nearly all respects, the M40s' performance was exemplary. In other words, Juniper didn't win a Tester's Choice award just for showing up. It took the prize because its routers earned it.
Among the Missing
You'd think that some of the router vendors that have been begging Data Comm to cover their gear would at least show up for the test. Fact is, most simply didn't have equipment ready (and several said it would be sometime next year before their gear would be out the door). Among the no-shows: Argon Networks Inc. (Littleton, Mass.; now a division of Unisphere Solutions Inc. [Burlington, Mass.]); Avici Systems Inc. (Chelmsford, Mass.); Nbase-Xyplex (Littleton, Mass.); Neo Networks Inc. (Minnetonka, Minn.); Pluris Inc. (Palo Alto, Calif.); Torrent Networking Technologies Corp. (Silver Spring, Md.; now a division of Ericsson AB [Stockholm, Sweden]); and Xylan Corp. (Calabasas, Calif.; now a division of Alcatel N.V. [Paris]). Netcore Systems Inc. (Wilmington, Mass.; now a division of Tellabs Inc. [Lisle, Ill.]) has a product, but it uses ATM and not POS interfaces. Clearly, it's still early in the evolution of next-generation routing.
Then there was Nexabit Networks Inc. (Marlborough, Mass.; now a division of Lucent Technologies Inc. [Murray Hill, N.J.]), whose CEO sat down with us and nodded encouragement as we sketched a draft of the test bed setup. But when it came time to submit product, the company said customer demand prevented it from supplying even one chassis for testing.
The most shameful no-show story? That of market leader Cisco Systems Inc. (San Jose, Calif.). It initially agreed to submit its GSR 12000 routers for this test, and we spent weeks with the vendor discussing the issues it wanted addressed or emphasized. Cisco even told us it was manufacturing devices specifically for this test. As late as the day before its scheduled test date, the vendor assured us that the systems were on the loading dock, waiting to go out.
Then the call came: The vendor said it wouldn't be participating after all. Cisco-the world leader in routing, with a market capitalization of $200 billion-told us it had an inventory problem. Customer demand had suddenly surged, a marketing rep said haltingly. Could we postpone testing from mid-July "until the next [financial] quarter?" That struck us as odd, since test equipment isn't counted toward revenue by Cisco or any other vendor. The rep also said Cisco generally doesn't participate in public testing-its excellent attendance record in Data Comm tests over the past five years apparently notwithstanding.
Stymied, we sought to borrow GSRs from two Cisco customers. But as Cisco pointed out when it got word of our plans, both customers' GSRs were running unreleased code and thus couldn't be used in our tests. Clearly, Cisco didn't want its GSRs or new OC48 (2.488-Gbit/s) cards subjected to public scrutiny-at least not for another quarter.
Juniper, on the other hand, was as willing and enthusiastic a participant as the other vendors were reserved. It supplied four of its M40 chassis, each equipped with 12 OC12 (622-Mbit/s) and three OC48 interfaces, and we linked them using the OC48 interfaces in a fully meshed configuration (see "Test Methodology"). Our tests covered three areas: routing table capacity; baseline measurements of forwarding rate and latency; and routing performance during network instability (better known as route flapping).
Table capacity is a key measure of a router's ability to handle growth. In January 1994, there were around 15,000 BGP (border gateway protocol) routes in the Internet, each of which had to be stored as individual table entries in all Internet core backbone routers. Today, there are more than 60,000. Some large ISPs (Internet service providers) say their core routers hold around 70,000 entries; one says it has as many as 95,000 in its largest routers.
To see how well routers would deal with a much larger Internet, we took that 95,000 figure, rounded it up to 100,000, and multiplied by five. QOSnetics developed a script for its QA Robot testing tool that offered as many as 520,000 unique routes to the Juniper boxes. Each route had a "prefix length" (the number of bits in the network part of the address) of 22 bits, the average length on the 'Net today. We also asked Juniper not to use route aggregation, a method of combining similar routes to reduce the number of table entries.
We began by flushing the router to ensure that it held zero entries. Then we offered 40,000 unique BGP table entries to one of the Juniper routers, which would then propagate the entries to the other three routers, and in turn to another QA Robot attached to one of these. The second QA Robot verified that the routers successfully advertised all 40,000 routes we generated (and because it received 40,000 routes, the second QA Robot also acted as a check against route aggregation). If a test was successful, we added another 40,000 routes. We increased the load in increments of 40,000 until the second QA Robot reported that it didn't receive all the routes we injected.
We tested 360,000 routes successfully before the M40s cried uncle. That's six times larger than the current Internet. Even more astonishing is that one of the routers under test was equipped with 256 Mbytes of memory instead of its maximum of 512 Mbytes. We might have been able to build even larger routing tables with more memory.
Rates and Latencies
The next set of tests focused on baseline measurements of forwarding rate and latency. Forwarding rate tests are like drag races: The objective is to move traffic as fast as possible. Latency is a measurement of the amount of time the system hangs on to each packet before forwarding it. Obviously, latency should be as low as possible.
Forwarding rate and latency are not directly related. Consider another auto metaphor, the traffic circle: Cars might enter or leave the circle at high speeds, which equates to a high forwarding rate. But if there's a traffic jam inside the circle, cars might sit for a while-which equates to high latency. Now shift gears back to networking, where these traffic jams can lead to sluggish application performance and dropped sessions. We conducted the forwarding rate and latency tests using various permutations of three variables: packet length, prefix length, and the number of router table entries.
Ideally, packet length shouldn't matter, but in practice it can make a big difference. We offered streams of 60-byte IP packets, close to the minimum allowed in Ethernet, as the most severe test; this involved the highest traffic rate and therefore represented the most work for the routers. We also offered 1,504-byte IP packets, close to the Ethernet's maximum of 1,518 bytes, and 340-byte packets, the average length for the top 10 TCP and top 10 UDP applications as observed on an Internet backbone link by the Cooperative Association for Internet Data Analysis (CAIDA, San Diego).
Prefix length describes how much of an IPv4 address' 32-bit subnet mask is reserved for the network (as opposed to bits referring to the host). We used 22-bit prefixes (usually rendered /22 and referred to as "slash 22" prefixes) because that's the average length in the Internet's core routers. CAIDA's studies of prefix length distribution show the overwhelming majority fall between /16 and /22. We initially planned to use /16 as well, but when we found we couldn't fit all our desired routes into that short a prefix we used /17 instead.
As for router table entries, we used two settings. First, we loaded 73,728 prefixes into the M40s, with one route per prefix, or slightly more entries than there are in the Internet's core routers today. We also wanted to determine whether the size of the routing table would have any effect on performance, so we conducted tests using 36,864 prefixes as well.
In all forwarding rate tests, we used new POS interfaces for Netcom's Smartbits traffic analyzer to offer packets at line rate to all OC12 ports in a partially meshed pattern (traffic on each inbound port destined for ports on all other chassis). To avoid congestion, we staggered traffic patterns so that no two packets ever arrived at any destination port at the same instant.
The M40 proved to be a real speed demon in our forwarding rate tests, moving more packets per second than any other device we've ever tested (see Figure 1). In the 60-byte tests, the devices didn't quite run at line rate-but they came close. In the worst case, they forwarded 99.40 percent of offered traffic. That's an impressive achievement, given that we offered more than 57 million packets per second, by far a record for a Data Comm lab test. And the M40s were letter-perfect in forwarding 340- and 1,504-byte packets, moving traffic at line rate with no loss.
Equally impressive: Prefix length had virtually no impact on forwarding rates. Differences between rates for /17 and /22 prefixes were either negligible or nonexistent, even though a router might have to dig deeper into an address with the longer prefix. Juniper assured us that this was not the case with its architecture, and our results support that assertion.
We should note that we did not conduct tests in which the routers had to arbitrate between prefixes of various lengths, forcing a so-called longest-match lookup to find a packet's most appropriate route. Such lookups might affect forwarding rate, though Juniper says this hasn't been a problem in its internal tests.
There was one curious result, though: Forwarding rates were slightly higher for 60-byte packets when there were more table entries-not fewer. One possible explanation may be that we used more contiguous address space in our 74,000-entry table than in our 37,000-entry table. Juniper says address contiguity shouldn't matter to its lookup algorithm (and, in any case, we verified that the routers weren't aggregating addresses, where contiguity would have been an advantage). We can't say definitively why this occurred, but we don't consider it a problem: The differences in forwarding rate between the two table sizes are marginal.
But latency wasn't a marginal issue: It proved to be the most controversial part of this test. Simply put, we observed unusually long delays in our latency tests. We're not prepared to say this indicates a problem with Juniper's equipment; the long delays might also be the result of the way we conducted our tests.
The worst-case numbers involve 60-byte packets, with delays ranging from 146 to 177 milliseconds. In comparison, earlier tests have shown some OC12 ATM and gigabit Ethernet switches to have latencies of 20 to 30 microseconds, more than an order of magnitude lower. Clearly, something was wrong.
There are two possible explanations, but neither is conclusive. First, latency is cumulative, so our four-box configuration could have yielded much higher delay measurements than a single-box test would have. (Still, we've achieved low-tens-of-microseconds measurements with similar four-box configurations in the past.)
Second, the heavy loads we offered could have played a role. We offered traffic at line rate to derive these measurements. As an experiment, we tried backing off the load to 50 percent of line rate and rerunning the test. Here, latency for 60-byte packets fell to just 22 microseconds-in line with measurements for other ATM and gigabit Ethernet switches we've tested. When we boosted the offered rate back up to 98 percent of line rate, latency again skyrocketed to levels virtually identical to those produced by the 100 percent load.
There's a school of thought that says latency should only be tested with lower loads, on the theory that heavy loads could produce congestion, and as a result tests might also measure device buffering. We haven't used that approach because our test scripts don't congest devices in the first place.
Juniper's engineers weren't so sure. We double- and triple-checked our scripts, but found nothing to suggest we overloaded the routers. Congestion did occur, either on the OC48 backbone links or on the outbound OC12 ports. But we couldn't find anything in our scripts that would have produced it.
About the only definitive conclusion we can draw is that latency will be high when all ports are hit simultaneously at 98 percent or more of line rate. As our 50 percent load test suggests, the M40s have no trouble delivering very low latency under less extreme conditions.
In our final set of tests we simulated route flapping, where a large number of routes disappear and reappear in rapid succession. This is a very real problem in Internet backbones, and not just for the flapped routes; performance on stable routes can suffer too.
To get a sense of the problems route flaps can create, we again consulted with architects at the some of the world's largest ISPs. They told us that flaps can involve thousands of routes, with typical route withdrawal/reinsertion rates running in the hundreds per second. They also told us that worst-case flap frequency involved one to two flaps (route withdrawals or reinsertions) per minute.
We then took those numbers and boosted them a bit to make the flap test even more stressful than current conditions. We used QA Robot to inject 73,728 prefixes, only this time each prefix contained a primary, secondary, and tertiary path. (This means a packet should take one of the backup routes if the primary fails.) Then we used the Smartbits POS cards to offer 60-byte packets at line rate on all ports in a partially meshed pattern.
After offering steady-state traffic for 30 seconds, we began flapping some routes. Our flap pattern was to withdraw 12,288 primary routes-one-sixth of the total-within 1 second (which is much faster than most Internet backbone routers flap today). Then, after a 30-second pause, we readvertised these primary routes, again within 1 second; 30 seconds later, we withdrew a different batch of 12,288 primary routes, and 30 seconds after that reinserted the second group.
Our objective was to measure flapping's effect on traffic using both affected and stable routes. We wanted to determine how badly forwarding rate would slip during a flap, and for how long.
The M40s stood up quite well (see Figure 2). On flapped routes, average forwarding rate per port fell from around 1.17 million packets per second to a low of 922,000 pps, or about 78 percent of steady-state conditions. It took about 20 seconds after the first flap for traffic to reconverge (to regain its former rate over the secondary route).
Traffic on stable routes also was affected, but only in a minor way. Here, the forwarding rate fell to a low of 1.05 million pps, or about 89 percent of steady-state conditions. And rates dipped only very briefly, about 5 seconds compared with 20 seconds for flapped routes.
Notably, reinsertion of the primary routes is a much less significant event than withdrawal on both flapped and stable routes. One possible explanation is that looking up secondary routes may be a more compute-intensive task than re-establishing routes on previously used paths.
Also, while a router would ideally show no degradation on stable routes during flapping, our results aren't necessarily cause for concern. There are three reasons for this. ISP architects say even bad flapping conditions aren't as stressful as those we used in our test. Also, current routers' forwarding rates are much more volatile during flaps, in some cases dipping down to zero.
Finally, many BGP implementations (including Juniper's) include a route dampening feature that helps forestall this kind of problem. With dampening, a router can be configured not to propagate updates whenever the number of advertisements or withdrawals it receives within a given interval crosses a predefined threshold.
Data Comm acknowledges Netcom Systems Inc. (Calabasas, Calif.), which made its Smartlab facility available and supplied us with its new POS cards for the Smartbits analyzer. Smartlab manager Jerry Perser and engineers Patrick Connolly and Bing Song also developed test scripts. QOSnetics Inc. (Portsmouth, N.H.) supplied us with its QA Robot router test tool, and software engineers Brian M. Dubriel and Josh Jones developed test scripts for the QA Robot. Extreme Networks Inc. (Santa Clara, Calif.) supplied a Summit4 gigabit Ethernet switch as test bed infrastructure.
Robert Mandeville is director of European Network Laboratories (Paris). He can be reached at firstname.lastname@example.org. David Newman is senior technology editor for Data Comm. His e-mail address is email@example.com.
Top Performers -- Juniper Networks
The M40 proved that it can handle an Internet six times larger than today's version, and it moved packets faster than any device we've ever tested. Juniper is especially deserving of recognition in a test where no other vendor was willing or able to subject its packet-over-Sonet equipment to public scrutiny. But Juniper didn't win simply by showing up-it won by building a box capable of withstanding the loads of today's Internet backbones and tomorrow's.
Data Comm invited 11 vendors to take part in this test. We asked them to supply four chassis, each equipped with 12 OC12 (622-Mbit/s) packet-over-Sonet (POS) interfaces and 3 OC48 (2.488-Gbit/s) POS interfaces; 2 fast Ethernet or gigabit Ethernet interfaces; and routing support. If they couldn't supply a full complement of interfaces or chassis, we asked them to bring some subset of this configuration.
We conducted three sets of router performance tests: baseline measurements; BGP 4 (border gateway protocol version 4) route capacity tests; and route flapping tests.
Our test bed topology included the four chassis under test connected in a fully meshed configuration using OC48 interfaces (see the figure). We offered test traffic using OC12 POS cards for the Smartbits traffic analyzer from Netcom Systems Inc. (Calabasas). We injected routing table information over gigabit Ethernet interfaces using the QA Robot routing performance test system from QOSnetics Inc. (Portsmouth, N.H). We used a Summit4 switch from Extreme Networks Inc. (Santa Clara, Calif.) to bridge traffic between the fast Ethernet interfaces on the QA Robot system and the gigabit Ethernet interfaces on the routers under test. (At the time of testing QOSnetics had just added support for native gigabit Ethernet interfaces; performance was identical using these interfaces or the Extreme switch.)
In the baseline tests, we measured per-port and aggregate forwarding rate; per-port minimum, maximum, and average latency; per-port jitter; and per-port jitter distribution under various traffic conditions. We offered full-duplex traffic at line rate to each OC12 interface in a partially meshed pattern. (Traffic from each interface was destined to all other interfaces on all other chassis.) The key variables were prefix length of the table entries; number of routing table entries; and length of the packets offered. As for routing table entries, we first used QA Robot to inject 36,864 prefixes, with each prefix containing one route; we then repeated the test with 73,728 prefixes loaded, again with one route per prefix. Initially, we injected only so-called /17 prefixes-those with a 17-bit network mask. Then we reinjected the "small" and "large" tables, this time using /22 prefixes. As for packet lengths, we repeated all permutations of this test using 60-, 340-, and 1,504-byte IP packets.
The goal of the routing table capacity tests was to determine the largest number of BGP4 routes the devices under test could support. QA Robot injected 40,000 /22 prefixes to one router, and we verified that all routes were advertised by another router on the test bed. (Vendors were not allowed to aggregate routes; QA Robot verified that the number of routes advertised was equal to the number offered.) If a test run succeeded, we increased the number of routes by 40,000, and continued the exercise until QA Robot determined that not all offered routes were propagated.
The goal of the route flapping test was to determine the effect of network instability on router forwarding rates. We began by injecting 73,728 /22 prefixes from QA Robot to Router A. Each prefix contained three routes-primary, secondary, and tertiary. Then the Smartbits POS cards offered a steady-state stream of 60-byte IP packets at line rate in a partially meshed pattern over all routes. We then withdrew 12,288 of the primary routes and verified that traffic was rerouted over the secondary paths. After an interval of 30 seconds, we re-injected the original 12,288 primary routes, and verified that traffic was rerouted over the primary paths. We repeated this process at 30-second intervals for 120 seconds. We measured forwarding rates for all routes during the entire test run.-D.N.
Copyright ® 1999 CMP Media Inc.