Editor's note: These minutes have not been edited. From: George Swallow Tag Switching BOF The meeting was chaired by Vijay Srinivasan and George Swallow. The purpose of the meeting was strictly informative, although it may result in the formation of a new working group. The chairs kept the meeting moving briskly. Tag Switching Architecture - Yakov Rekhter, Cisco Yakov gave an introduction to tag switching. The design objective was to address open issues in network layer routing, including richer functionality, better performance, scalability, integration of cell and frame-switching technologies, and the ability to evolve routing systems gracefully and in a timely fashion to meet new and emerging requirements. Two components were discussed: forwarding and control. The former uses tag information in the frames or cells to forward it out another interface, rewriting the tag. The latter builds and maintains the database used by the forwarding function. The forwarding algorithm is network-layer protocol independent. The tag can be part of the L3 header (IPv6), part of the L2 header (ATM and Frame Relay), or in a shim (IPv4). A tag can be bound to a set of destinations (a CIDR prefix), a single destination, or a single TCP port. The control component creates the tag bindings between tags and routes. Routers allocate tags, and bind them to routes. The tags then have to be distributed, which can be done by piggy-backing on other protocols (BGP, RSVP, PIM) or via a tag distribution protocol. Tags are created as a result of unicast routing updates, PIM Join/Prune messages, and RSVP Path/Resv messages. The advantage of running off of routing and other existing protocols, and piggy-backing when possible, is that it minimizes additional control traffic, it's independent of traffic patterns/profiles, and minimizes the impact on the forwarding performance. Destination-based routing module: this is the module that corresponds to current internet routing. There are three possible maintenance schemes: upstream, downstream, and on downstream on demand (see his draft for further information). Tags can be stacked in a packet, to allow interior and exterior routing to be isolated from each other when necessary. Explicit routing module: This allows destination-based routes to be overridden by configuration, or as a result of resource reservation. This may also be applied to support QoS-based routing. Tag switching with ATM is "controversial". The VCI field carries the tag - the VPI can also be used for two levels of tags. Question: Why does Yakov think that it scales well? Yakov: It scales with respect to the number of routes that are being maintained. Especially with internal routes, which are very stable and are relatively small in number. Question: Is it useful in a fully-meshed topology? Yakov: It works best in a non-meshed topology, such as the current Internet. Multicast - Dino Farinacci, Cisco The benefits of tag switching to multicast forwarding: fast routing table lookups (you typically have to do two lookups, one on the source and the other on the multicast address), shared-tree vs. source-tree longer match is faster, hits on negative cache entries cause quick discard, No header checksum, and faster RPF checks - you can throw a packet away quickly if the tag isn't in the table. Tag allocation is simple on point-to-point links, both for unicast and multicast cases, since there is only one receiver. Unicast tag allocation is simple on LANs. Multicast tag allocation is not so simple on LANs. Tag values are local to the LAN, but you can't overlap them with other TSRs. The proposal is to partition the tag space among multicast TSRs on a LAN. Dino proposed an algorithm for distributing the tag allocation among the TSRs on the LAN. Joel Halpern: Dino mentioned the use of PIM Hellos for allocations, but Joel was asking if IGMP or other protocols besides PIM can be used for the allocation. Dino said that this was considered. Tag distribution issue: should distribution be done upstream or downstream? Upstream distribution was considered, because it is simple and intuitive. However, this required merging the data and control components, which slowed down forwarding. Piggy-backing on other control messages created race conditions and required tag reassignment when the upstream neighbor changes. So, downstream distribution was chosen. It is consistent with TDP for unicast routing, it can carry multicast routes and tag assignments together, can use the same algorithm for all media types (downstream is more natural for ATM), randomizes tag assignment among downstream routers, and avoids tag reassignment when there are RPF changes. TDP was not used, because tag binding and routing information advertised separately produced race conditions. Tag assignments are carried in PIM Join/Prune messages. There is a 1-1 mapping between a multicast route and a tag, they are delivered together in the same packet (or lost together). Triggered Join/Prune messages carry the advertisement. When the assigner on a LAN is not interested in the group anymore, reassignment does not occur. If the assigner crashes, reassignment is required. If the LAN partitions, each partition can use their existing tag, each with different upstream neighbors. When the partition heals, reassignment only occurs if there was two tags for the same routing table entry. When non-TSRs or no group members on a LAN, then tag switching is not used for multicast on that LAN. Dino's description was for sparse-mode PIM, but it also works with dense-mode PIM. Question: How does downstream allocation work when the neighbor doesn't find the tag allocation acceptable (the receiver doesn't support this range, do to being from a different vendor)? Dino started to discuss the point, but was then interrupted. George (and Joel): Technical issues like this should be discussed on the list. The chairs did not allow any further questions at this point. ARIS - Nancy Feldman, IBM ARIS stands for Aggregate Route-Based IP Switching. ARIS objective: Switch all best-effort traffic, use routing information, minimize overhead and number of egress identifiers, prevent routing loops. A switched path is established from each ingress node to each egress node. Egress identifiers are used to bundle many CIDR prefixes that share a common exit point. No changes are needed to existing routing protocols. There is one switched path per egress identifier. An egress ISR many have multiple egress IDs. Forwarding: The FIB extension associates routes with next-hop/egress identifier pair. There is a conventional route table lookup, the network layer lookup forwards on switched path (if exists), best switched path selection with levels of aggregation. Nancy gave an example of how switched paths are established. Each egress builds a point-multipoint tree to all of the ingress nodes; the ingress nodes then send the packets on the reverse direction on the tree (become a multipoint-point tree). This aggregation provides scalability (on the order of the number of egresses), VC conversation, less maintenance overhead, easy network management, many prefixes map to a single egress identifier, and switched paths are merged. For ATM, merged VPs or VCs (when the switches support frame merging) can use used. Loops are prevented when the paths are established by including the list of routers along the path while the path is being constructed. Loops are also prevented during topology changes as well. Exterior topology changes are instantaneous, because the path to all exterior routers already exists when exterior routing changes a particular CIDR prefix to a different exterior router. ARIS supports multiple route paths to the same egress identifier to increase performance, but at the cost of increasing the number of switched paths. Multicast uses the same mechanisms as unicast. Both source-specific trees (DVRMP, PIM) and shared trees (PIM-SM, CBT) are supported. Benefits of egress establishment: Single point of control, simplicity, traceroute, multipath, no unnecessary multicast switched paths, guaranteed loop free paths. ARIS benefits: scalability, multiple levels of aggregation, loop-free paths, traceroute, multipath, multicast, soft-sate protocol, single point of control, no routing protocol modification. Brian Carpenter: What is the relationship to NHRP? Answer: It's another issue. Q: What happens if someone shortcuts through your cloud? A: This is ships in the night. This will be further discussed on the list. Q: Is there a difference from SITA on how VCI and VPI allocation is done? Answer: Yes. Q: How is QoS traffic handled: A: RSVP will be discussed in a following discussion. Q: Could this be used as a router-router NHRP paradigm? This is something not addressed. A: Once we have a WG, we'll discuss these sort of proposals. Q: Can these be evolved and combined? A: Sure. CSR - Hiroshi Esaki, Toshiba URLs of interest: ftp://ftp.wide.toshiba.co.jp/pub/csr, http://www.toshiba.com CSR is both flow driven and topology driven for QoS, scalability, and link-cost. Design policy: high throughput by using ATM hardware, scalability and QoS by being both topology and flow driven, flow aggregation by using address prefix forwarding, multiprotocol capability through L2 and L3 code points, interoperability by presenting a standard interface (ATM UNI), soft-state policy to allow unreliable node behavior, and mobility and VLAN support. Hiroshi compared cut-through (ATM) and hop-by-hop (packet) forwarding. Cut-through path establishment is triggered by observing traffic to a particular TCP port. The FANP (Flow Attribute Notification Protocol) is used to establish the cut-through path. He presented data from a Digital gateway router to show the traffic distributions for different applications, and which proportion of the traffic should be switched. Overview of FANP. It supports multiple network layers (IP, IPX, etc.), and connection-oriented datalinks (ATM Frame Relay, etc.). It supports a flexible flow description. It uses up-stream-driven cut-through path establishment. In Japan, there is a testbed (called WIDE) which is incorporating CSRs. Hiroshi concluded by saying that requirements for tag switching (IP switching in general) should include being both flow and topology driven, and support SVCs for ATM. Ohta-san asked about the relationship to RSVP. A: This is still under consideration. Q: Do you believe you get better performance if the end-hosts have the mapping information? A: Yes. Q: The last thing that servers and clients want to do is table lookups to find the correct path. You are increasing router performance at the expense of the hosts. A: Perhaps. Switching RSVP - Arun Viswanathan, IBM Objective: switch RSVP flows in IP switching environment, no changes in current mechanics of RSVP, scalable to future RSVP extensions, flexible enough to support merging in the future. Why is RSVP ideal for setting up switched connections with QoS? RSVP has the required semantics, messages are processed hop-by-hop, and it is extensible. There is a new RSVP object to carry L2 flows. "One VC per sender" paradigm. The egress ISR injects a new object in the RESV message. Best effort data flow may stop to unreserved receivers when the first receiver makes a reservation; this can be resolved by adding the default VC to the pt-mpt VC, adding the IP control point to the pt-to-mpt VC, or use a PATH message to setup VC downstream of that node. The TTL decrementing is performed at the egress, based on the hop count in the PATH message. Paths are created upstream on demand. Path merging is not simple, but a solution is discussed in the draft. Ohta-san asked several technical questions about LIJ, which were taken off-line. Tags Negotiation and RSVP - Fred Baker, Cisco URL: ftp://ftp-eng.cisco.com/fred/tag-switch/draft-baker-tag-rsvp-00.txt. Problem space: four kinds of tagged routes, some of which have QoS implications. Tags may be negotiated by TDP or by piggy-backing other protocols. Proposal: if a tag has already been allocated, use it. Otherwise, piggyback on RSVP. Fred gave a quick overview of RSVP and the filter styles, fixed, shared explicit, and wild-card (this last one is the hardest to solve). While there are several alternatives to mapping RSVP sessions to tags, the proposal is a "one tag per session" model, in which each RSVP session creates a new tag. This model has the following advantages: (1) If tags are mapped into data link constructs, the model exploits the traffic control and scheduling capabilities of the data link, with their hardware or firmware mechanisms. (2) It reduces the network level processing load by removing the need for multiplexing of RSVP sessions onto a connection. Fred presented a suggestion made by Bob Braden that a new filter specification and sender template format be used, that includes the tag value. Fred then pointed out that this would require universal implementation. Also, the Wild-card format has not filter spec in the RESV message. Fred suggests using a tag value object in the RSVP messages. For multicast, upstream assignment is used. This object is carried in the PATH message to communicate the tag and in the RESV message as an acknowledgemnet. In the unicast case, downstream assignment is used. The object appears in the RESV message to communicate the tag to the upstream router. There are some issues in the wild-card format: it needs a tag per sender in some implementations (CSR), no obvious way to do this in the RESV (but easy in the PATH message). The discussion pointed out that the two proposals were largely similar. Fred and Arun agreed to work together. Open discussion on working group formation, charter, etc. Vijay started the discussion by going over proposed next steps. Where are we now: multiple proposals: ARIS, Tag Switching, CSR/FANP, IFMP, etc. All have unique desirable properties. Need to take a step back and see what are the requirements that are desirable for these solutions. Requirements: Control flows to run IP and other internetworking layer protocols, Support for unicast as well as multicast communication, support for higher-layer resource reservation mechanisms such as RSVP, and backward compatibility issues (traceroute, other management tools, etc.). Proposed activities: Draft charter, determine other WGs and bodies to work with, start work on a common architecture and support for RSVP and multicast, define specific mechanisms for different layer 2 technologies, following the ISSLL model (ATM both with and without UNI/NNI signaling, LANs, packet over SONET, Frame Relay, etc.). Drew Perkins: Why are we here, and what problem are we trying to solve? Is it because L3 routing isn't fast enough, or ATM signaling isn't fast enough, or something else? What are we trying to do, and what are the goals? George: The presenters gave a good number of reasons for this work (performance, scalability, etc.), but that will be included the charter. Brian Carpenter: What about sideways compatibility with what's going on in the ION group and other things that are going on these days? These issues need to be seriously considered, to see how it fits in. If there is any commonality, it is that "it is a good thing to co-locate the routers and switches in the topology". This may be the answer of what we're doing here. George: Compatibility with existing things is an absolute requirement. Joining together routing and switching is good for scalability and performance, but the abstractions provided by NHRP and other overlay models are also important in applications where topology virtualization is desired. Ross Callon: There is sufficient interest in this sort of work, but we need a better working group title than "tag switching". How about IP switching? George: This may have been trade-marked by Ipsilon. The name of the work group will be discussed. Ross: We also really need to understand the requirements well, before we can start to discuss the technical proposals. Tying this to RSVP is a good idea, but it shouldn't be tied closely to any particular routing protocol - it should be routing protocol independent. It should also work well in a hierarchy, for scalability. These and other requirements are important for the WG to identify. Q: You must first address if current scalability problems are the result of implementations, or as the result of the current protocol definitions. There also has to be consideration of the hosts and servers; you want to do as little routing in the hosts and servers as possible, to allow them to perform. George: I don't think any of the proposals today require host changes, everything can be done in the routers. But changing hosts, especially file servers, could improve overall performance. Q: The description of the CSR was disturbing because it requires hosts to choose the VC for a flow. Hiroshi: This wasn't an explicit requirement. Jim Luciani: The name "tag switching" is inappropriate. He didn't see a requirement for a framework document and an applicability statement in Vijay's presentation; these documents should be the very first work done in the group. Fred Baker: One requirement for tag switching was making it link-layer independent; he wants this to be a requirement here as well. Tracy Mallory: He has concerns about the applicability; there are places like firewalls and NATs where you don't want to use this. You have to be real clear about which problems are being solved and where you don't want to do this. Eric Crawley (the ISSLL chair): Try to avoid the ISSLL model if you can; it is too hard to manage. Brian Carpenter: Call the group "Switching inside Large Clouds", so that it will go "as smooth as silk". Vijay: I'm not sure if we should restrict the scope the "large clouds". Q: There should be a mention of how large the scope of the solution should be? Should this apply to the Internet, or [just} to smaller IP clouds? Dino: The Internet growth is happening faster than equipment vendors can keep up. The purpose of this work should be to help the vendors keep up with this growth. Andrew Smith: The scope and applicability are the first things to work on; NOT parallel efforts on the technology. Bruce Davie: One problem that needs to be solved is that if companies go off and implement stuff on their own, then there will be interoperability problems. That's the primary reason to form a working group Keith McCloghrie: There are limits to what CIDR can do to constrain the growth in the routing tables. It would be great if we could further scale things here by having a single path for multiple CIDR table entries. George: This was covered both by ARIS and Tag Switching. In ARIS, exit identifiers are used. Associating multiple routes with a label imposes an "area code" over the routes that are destined to a particular exit. George asked for a show of hands of those who would like to see a WG formed. There was overwhelming support for a WG to work on these issues, if only to further interoperability between vendors already working on these solutions. Joel Halpern (the Routing AD): I want to see a concise problem statement and intended work proposal in the charter. This will be further expanded in the framework document. The name of the group will change. We'll try to charter this as quickly as possible. The real place where the action happens is on the list; work can proceed. ====================================================================== George Swallow Cisco Systems (508) 244-8143 250 Apollo Drive Chelmsford, Ma 01824