Editor's note:  These minutes have not been edited.
 
From: George Swallow <swallow@cisco.com>


Tag Switching BOF

The meeting was chaired by Vijay Srinivasan and George Swallow.  The
purpose of the meeting was strictly informative, although it may
result in the formation of a new working group.  The chairs kept the
meeting moving briskly.

Tag Switching Architecture - Yakov Rekhter, Cisco
    <draft-rfced-info-rekhter-00.txt>
    <draft-rosen-tag-stack-00.txt>
    <draft-davie-tag-switching-atm-00.txt>
    <draft-doolan-tdp-spec-00.txt>

Yakov gave an introduction to tag switching.  The design objective
was to address open issues in network layer routing, including richer
functionality, better performance, scalability, integration of cell
and frame-switching technologies, and the ability to evolve routing
systems gracefully and in a timely fashion to meet new and emerging
requirements.

Two components were discussed: forwarding and control.  The former
uses tag information in the frames or cells to forward it out another
interface, rewriting the tag.  The latter builds and maintains the
database used by the forwarding function.

The forwarding algorithm is network-layer protocol independent.  The
tag can be part of the L3 header (IPv6), part of the L2 header (ATM
and Frame Relay), or in a shim (IPv4).

A tag can be bound to a set of destinations (a CIDR prefix), a single
destination, or a single TCP port.

The control component creates the tag bindings between tags and
routes. Routers allocate tags, and bind them to routes.  The tags then
have to be distributed, which can be done by piggy-backing on other
protocols (BGP, RSVP, PIM) or via a tag distribution protocol.  Tags
are created as a result of unicast routing updates, PIM Join/Prune
messages, and RSVP Path/Resv messages.  The advantage of running off
of routing and other existing protocols, and piggy-backing when
possible, is that it minimizes additional control traffic, it's
independent of traffic patterns/profiles, and minimizes the impact on
the forwarding performance.

Destination-based routing module: this is the module that corresponds
to current internet routing.  There are three possible maintenance
schemes: upstream, downstream, and on downstream on demand (see his
draft for further information).

Tags can be stacked in a packet, to allow interior and exterior
routing to be isolated from each other when necessary.

Explicit routing module: This allows destination-based routes to be
overridden by configuration, or as a result of resource reservation.
This may also be applied to support QoS-based routing.

Tag switching with ATM is "controversial".  The VCI field carries the
tag - the VPI can also be used for two levels of tags.

Question: Why does Yakov think that it scales well?  Yakov: It scales
with respect to the number of routes that are being maintained.
Especially with internal routes, which are very stable and are
relatively small in number.

Question: Is it useful in a fully-meshed topology?  Yakov: It works
best in a non-meshed topology, such as the current Internet.

Multicast - Dino Farinacci, Cisco
    <draft-farinacci-multicast-tag-part-00.txt>
    <draft-farinacci-multicast-tagsw-00.txt>

The benefits of tag switching to multicast forwarding: fast routing
table lookups (you typically have to do two lookups, one on the source
and the other on the multicast address), shared-tree vs. source-tree
longer match is faster, hits on negative cache entries cause quick
discard, No header checksum, and faster RPF checks - you can throw a
packet away quickly if the tag isn't in the table.

Tag allocation is simple on point-to-point links, both for unicast and
multicast cases, since there is only one receiver.

Unicast tag allocation is simple on LANs.  Multicast tag allocation
is not so simple on LANs.  Tag values are local to the LAN, but you
can't overlap them with other TSRs.

The proposal is to partition the tag space among multicast TSRs on a
LAN.  Dino proposed an algorithm for distributing the tag allocation
among the TSRs on the LAN.

Joel Halpern: Dino mentioned the use of PIM Hellos for allocations,
but Joel was asking if IGMP or other protocols besides PIM can be used
for the allocation.  Dino said that this was considered.

Tag distribution issue: should distribution be done upstream or
downstream?  Upstream distribution was considered, because it is
simple and intuitive.  However, this required merging the data and
control components, which slowed down forwarding.  Piggy-backing on
other control messages created race conditions and required tag
reassignment when the upstream neighbor changes.  So, downstream
distribution was chosen.  It is consistent with TDP for unicast
routing, it can carry multicast routes and tag assignments together,
can use the same algorithm for all media types (downstream is more
natural for ATM), randomizes tag assignment among downstream routers,
and avoids tag reassignment when there are RPF changes.

TDP was not used, because tag binding and routing information
advertised separately produced race conditions.  Tag assignments are
carried in PIM Join/Prune messages.  There is a 1-1 mapping between a
multicast route and a tag, they are delivered together in the same
packet (or lost together).  Triggered Join/Prune messages carry the
advertisement.  When the assigner on a LAN is not interested in the
group anymore, reassignment does not occur.  If the assigner crashes,
reassignment is required.  If the LAN partitions, each partition can
use their existing tag, each with different upstream neighbors. When
the partition heals, reassignment only occurs if there was two tags
for the same routing table entry.  When non-TSRs or no group members
on a LAN, then tag switching is not used for multicast on that LAN.
Dino's description was for sparse-mode PIM, but it also works with
dense-mode PIM.

Question: How does downstream allocation work when the neighbor
doesn't find the tag allocation acceptable (the receiver doesn't
support this range, do to being from a different vendor)?  Dino
started to discuss the point, but was then interrupted.  George (and
Joel): Technical issues like this should be discussed on the list.
The chairs did not allow any further questions at this point.

ARIS - Nancy Feldman, IBM
    <draft-woundy-aris-ipswitching-00.txt>

ARIS stands for Aggregate Route-Based IP Switching.

ARIS objective: Switch all best-effort traffic, use routing
information, minimize overhead and number of egress identifiers,
prevent routing loops.

A switched path is established from each ingress node to each egress
node.  Egress identifiers are used to bundle many CIDR prefixes that
share a common exit point.

No changes are needed to existing routing protocols.  There is one
switched path per egress identifier.  An egress ISR many have multiple
egress IDs.

Forwarding: The FIB extension associates routes with next-hop/egress
identifier pair.  There is a conventional route table lookup, the
network layer lookup forwards on switched path (if exists), best
switched path selection with levels of aggregation.

Nancy gave an example of how switched paths are established.  Each
egress builds a point-multipoint tree to all of the ingress nodes; the
ingress nodes then send the packets on the reverse direction on the
tree (become a multipoint-point tree).

This aggregation provides scalability (on the order of the number of
egresses), VC conversation, less maintenance overhead, easy network
management, many prefixes map to a single egress identifier, and
switched paths are merged.  For ATM, merged VPs or VCs (when the
switches support frame merging) can use used.

Loops are prevented when the paths are established by including the
list of routers along the path while the path is being constructed.
Loops are also prevented during topology changes as well.

Exterior topology changes are instantaneous, because the path to all
exterior routers already exists when exterior routing changes a
particular CIDR prefix to a different exterior router.

ARIS supports multiple route paths to the same egress identifier to
increase performance, but at the cost of increasing the number of
switched paths.

Multicast uses the same mechanisms as unicast.  Both source-specific
trees (DVRMP, PIM) and shared trees (PIM-SM, CBT) are supported.

Benefits of egress establishment: Single point of control,
simplicity, traceroute, multipath, no unnecessary multicast switched
paths, guaranteed loop free paths.

ARIS benefits: scalability, multiple levels of aggregation, loop-free
paths, traceroute, multipath, multicast, soft-sate protocol, single
point of control, no routing protocol modification.

Brian Carpenter: What is the relationship to NHRP?  Answer: It's
another issue.  Q: What happens if someone shortcuts through your
cloud?  A: This is ships in the night.  This will be further discussed
on the list.

Q: Is there a difference from SITA on how VCI and VPI allocation is
done?  Answer: Yes.

Q: How is QoS traffic handled: A: RSVP will be discussed in a
following discussion.

Q: Could this be used as a router-router NHRP paradigm? This is
something not addressed.  A: Once we have a WG, we'll discuss these
sort of proposals.  Q: Can these be evolved and combined?  A: Sure.

CSR - Hiroshi Esaki, Toshiba
    <draft-esaki-co-cl-ip-forw-atm-00.txt>
    <draft-rfced-info-katsube-00.txt>
    <draft-rfced-info-nagami-00.txt>

URLs of interest: ftp://ftp.wide.toshiba.co.jp/pub/csr, 
                  http://www.toshiba.com


CSR is both flow driven and topology driven for QoS, scalability, and
link-cost.

Design policy: high throughput by using ATM hardware, scalability and
QoS by being both topology and flow driven, flow aggregation by using
address prefix forwarding, multiprotocol capability through L2 and L3
code points, interoperability by presenting a standard interface (ATM
UNI), soft-state policy to allow unreliable node behavior, and
mobility and VLAN support.

Hiroshi compared cut-through (ATM) and hop-by-hop (packet) forwarding.
Cut-through path establishment is triggered by observing traffic to a
particular TCP port.  The FANP (Flow Attribute Notification Protocol)
is used to establish the cut-through path.  He presented data from a
Digital gateway router to show the traffic distributions for different
applications, and which proportion of the traffic should be switched.

Overview of FANP.  It supports multiple network layers (IP, IPX,
etc.), and connection-oriented datalinks (ATM Frame Relay, etc.).  It
supports a flexible flow description.  It uses up-stream-driven
cut-through path establishment.

In Japan, there is a testbed (called WIDE) which is incorporating CSRs.

Hiroshi concluded by saying that requirements for tag switching (IP
switching in general) should include being both flow and topology
driven, and support SVCs for ATM.

Ohta-san asked about the relationship to RSVP.  A: This is still under
consideration.

Q: Do you believe you get better performance if the end-hosts have
the mapping information? A: Yes.  Q: The last thing that servers and
clients want to do is table lookups to find the correct path.  You are
increasing router performance at the expense of the hosts. A: Perhaps.


Switching RSVP - Arun Viswanathan, IBM

Objective: switch RSVP flows in IP switching environment, no changes
in current mechanics of RSVP, scalable to future RSVP extensions,
flexible enough to support merging in the future.

Why is RSVP ideal for setting up switched connections with QoS?  RSVP
has the required semantics, messages are processed hop-by-hop, and it
is extensible.

There is a new RSVP object to carry L2 flows.  "One VC per
sender" paradigm.  The egress ISR injects a new object in the RESV
message.

Best effort data flow may stop to unreserved receivers when the first
receiver makes a reservation; this can be resolved by adding the
default VC to the pt-mpt VC, adding the IP control point to the
pt-to-mpt VC, or use a PATH message to setup VC downstream of that
node.

The TTL decrementing is performed at the egress, based on the hop
count in the PATH message.  Paths are created upstream on demand.

Path merging is not simple, but a solution is discussed in the draft.

Ohta-san asked several technical questions about LIJ, which were taken
off-line.


Tags Negotiation and  RSVP - Fred Baker, Cisco

URL: ftp://ftp-eng.cisco.com/fred/tag-switch/draft-baker-tag-rsvp-00.txt.

Problem space: four kinds of tagged routes, some of which have QoS
implications.  Tags may be negotiated by TDP or by piggy-backing other
protocols.  Proposal: if a tag has already been allocated, use it.
Otherwise, piggyback on RSVP.

Fred gave a quick overview of RSVP and the filter styles, fixed,
shared explicit, and wild-card (this last one is the hardest to solve).

While there are several alternatives to mapping RSVP sessions to
tags, the proposal is a "one tag per session" model, in which
each RSVP session creates a new tag.  This model has the following
advantages:

(1)  If tags are mapped into data link constructs, the model exploits
     the traffic control and scheduling capabilities of the data link,
     with their hardware or firmware mechanisms.

(2)  It reduces the network level processing load by removing the need
     for multiplexing of RSVP sessions onto a connection.

Fred presented a suggestion made by Bob Braden that a new filter
specification and sender template format be used, that includes the
tag value.  Fred then pointed out that this would require universal
implementation.  Also, the Wild-card format has not filter spec in the
RESV message.

Fred suggests using a tag value object in the RSVP messages.  For
multicast, upstream assignment is used.  This object is carried in the
PATH message to communicate the tag and in the RESV message as an
acknowledgemnet.  In the unicast case, downstream assignment is used.
The object appears in the RESV message to communicate the tag to the
upstream router.

There are some issues in the wild-card format: it needs a tag per
sender in some implementations (CSR), no obvious way to do this in the
RESV (but easy in the PATH message).

The discussion pointed out that the two proposals were largely similar.
Fred and Arun agreed to work together.


Open discussion on working group formation, charter, etc.

Vijay started the discussion by going over proposed next steps.

Where are we now: multiple proposals: ARIS, Tag Switching, CSR/FANP,
IFMP, etc.  All have unique desirable properties.

Need to take a step back and see what are the requirements that are
desirable for these solutions.

Requirements: Control flows to run IP and other internetworking layer
protocols, Support for unicast as well as multicast communication,
support for higher-layer resource reservation mechanisms such as RSVP,
and backward compatibility issues (traceroute, other management tools,
etc.).

Proposed activities: Draft charter, determine other WGs and bodies to
work with, start work on a common architecture and support for RSVP
and multicast, define specific mechanisms for different layer 2
technologies, following the ISSLL model (ATM both with and without
UNI/NNI signaling, LANs, packet over SONET, Frame Relay, etc.).

Drew Perkins: Why are we here, and what problem are we trying to
solve?  Is it because L3 routing isn't fast enough, or ATM signaling
isn't fast enough, or something else?  What are we trying to do, and
what are the goals?  George: The presenters gave a good number of
reasons for this work (performance, scalability, etc.), but that will
be included the charter.  

Brian Carpenter: What about sideways compatibility with what's going
on in the ION group and other things that are going on these days?
These issues need to be seriously considered, to see how it fits in.
If there is any commonality, it is that "it is a good thing to
co-locate the routers and switches in the topology".  This may be the
answer of what we're doing here.  George: Compatibility with existing
things is an absolute requirement.  Joining together routing and
switching is good for scalability and performance, but the
abstractions provided by NHRP and other overlay models are also
important in applications where topology virtualization is desired.

Ross Callon: There is sufficient interest in this sort of work, but we
need a better working group title than "tag switching". How about IP
switching?  George: This may have been trade-marked by Ipsilon.  The
name of the work group will be discussed.

Ross: We also really need to understand the requirements well, before
we can start to discuss the technical proposals.  Tying this to RSVP
is a good idea, but it shouldn't be tied closely to any particular
routing protocol - it should be routing protocol independent.  It
should also work well in a hierarchy, for scalability.  These and
other requirements are important for the WG to identify.

Q: You must first address if current scalability problems are the
result of implementations, or as the result of the current protocol
definitions.  There also has to be consideration of the hosts and
servers; you want to do as little routing in the hosts and servers as
possible, to allow them to perform.  George: I don't think any of the
proposals today require host changes, everything can be done in the
routers.  But changing hosts, especially file servers, could improve
overall performance.  Q: The description of the CSR was disturbing
because it requires hosts to choose the VC for a flow.  Hiroshi: This
wasn't an explicit requirement.

Jim Luciani: The name "tag switching" is inappropriate.  He didn't see
a requirement for a framework document and an applicability statement
in Vijay's presentation; these documents should be the very first work
done in the group.

Fred Baker: One requirement for tag switching was making it
link-layer independent; he wants this to be a requirement here as
well.

Tracy Mallory: He has concerns about the applicability; there are
places like firewalls and NATs where you don't want to use this.  You
have to be real clear about which problems are being solved and where
you don't want to do this.

Eric Crawley (the ISSLL chair): Try to avoid the ISSLL model if you
can; it is too hard to manage.

Brian Carpenter: Call the group "Switching inside Large Clouds", so
that it will go "as smooth as silk".  Vijay: I'm not sure if we should
restrict the scope the "large clouds".

Q: There should be a mention of how large the scope of the solution
should be?  Should this apply to the Internet, or [just} to smaller IP
clouds?

Dino: The Internet growth is happening faster than equipment vendors
can keep up.  The purpose of this work should be to help the vendors
keep up with this growth.

Andrew Smith: The scope and applicability are the first things to
work on; NOT parallel efforts on the technology.

Bruce Davie: One problem that needs to be solved is that if companies
go off and implement stuff on their own, then there will be
interoperability problems.  That's the primary reason to form a
working group

Keith McCloghrie: There are limits to what CIDR can do to constrain
the growth in the routing tables.  It would be great if we could
further scale things here by having a single path for multiple CIDR
table entries.  George: This was covered both by ARIS and Tag
Switching.  In ARIS, exit identifiers are used.  Associating multiple
routes with a label imposes an "area code" over the routes that
are destined to a particular exit.

George asked for a show of hands of those who would like to see a WG
formed.  There was overwhelming support for a WG to work on these
issues, if only to further interoperability between vendors already
working on these solutions.

Joel Halpern (the Routing AD): I want to see a concise problem
statement and intended work proposal in the charter.  This will be
further expanded in the framework document.  The name of the group
will change.  We'll try to charter this as quickly as possible.  The
real place where the action happens is on the list; work can proceed.


======================================================================
George Swallow       Cisco Systems                   (508) 244-8143
                     250 Apollo Drive
                     Chelmsford, Ma 01824