I am the assigned Gen-ART reviewer for this draft. The General Area Review Team (Gen-ART) reviews all IETF documents being processed by the IESG for the IETF Chair. Please treat these comments just like any other review comments. For more information, please see the FAQ at . Document: draft-ietf-rtgwg-bgp-pic-12 Reviewer: Theresa Enghardt Review Date: 2021-01-10 IETF LC End Date: None IESG Telechat date: Not scheduled for a telechat Summary: The draft is basically ready for publication as an Informational RFC, but it has some context, clarity, and editorial issues that need to be fixed before publication. Major issues: None. Minor issues: Abstract: "In the network comprising thousands of iBGP peers exchanging millions of routes, many routes are reachable via more than one next-hop. Given the large scaling targets, it is desirable to restore traffic after failure in a time period that does not depend on the number of BGP prefixes." This part is missing a logical step in the argumentation between these two sentences. Is the first statement a prerequisite for restoring traffic, and then the question is how to make it scalable? Is the first statement the reason for things not being scalable? Please rephrase to make the relationship between these statements and the overall argumentation clear. Is "depending on the number of BGP prefixes" an inherent feature of BGP, or are you making any implicit assumptions? If so, please state them. "In this document we proposed an architecture […]" What does architecture mean in this context? Without any further qualification, in a networking context, as a reader I assume that "architecture" means "network architecture", i.e., something that involves multiple nodes such as multiple BGP speakers. But it appears that the document is only about the internals of each individual BGP speaker, i.e., how information is organized within the router. So maybe it's "router architecture" or "software architecture" or such? Please rephrase to make this clear in the abstract. Please clarify your scope. As the abstract specifically mentions iBGP, is this solution only about iBGP? Or is it about eBGP as well? Introduction: The introduction is missing a clear problem statement. Perhaps it's implicitly stated by saying that "convergence speed is limited by the time taken to serially propagate reachability information from the point of failure to the device that must re-converge.", but please be specific. Is this convergence speed that depends on information propagation time considered "too long", and therefore it needs to be reduced? Is it "too long" specifically in certain contexts, e.g., networks of a certain size? As the document actually appears to focus on speeding up changes within a singe node, it's not clear how this relates to propagation time. Does the node-internal speedup also speed up how fast propagated information converges? Why? As the statement about reachibility information being exchanged is the first sentence of the introduction, this makes it seems like it's fundamental to your document. If this is not the case, please consider starting the introduction with a clear problem statement that is actually fundamental to your document, such as "The way that information is currently organized within a BGP speaker [under … circumstances] is inefficient [for … reason] and leads to long convergence times." In the next sentence, "BGP speakers exchange reachability information about prefixes […]", the relationship to the problem statement is still not clear. Is this reachability information insufficient? Is there already is enough information to converge faster, and now your solution allows converging faster? Or something else? "[…] for labeled address families, namely AFI/SAFI 1/4, 2/4, 1/128, and 2/128 […]" - Please expand these acronyms on first use and provide a reference. "[…] an edge router assigns local labels to prefixes and associates the local label with each advertised prefix […]" Does this apply to incoming advertisements, outgoing advertisements, or both? Please make the context clear here. "[…] such as L3VPN [7], 6PE [8], and Softwire [6] using BGP label unicast technique[3]." The "such as" is not entirely clear: If these are examples of the technique that the rest of the sentence describes, perhaps "using technologies such as" would be more clear. However, as the entire sentence is already very long, please consider splitting the sentence and make the relationship between the statements clear. Please expand NLRI on first use and perhaps provide a definition or reference. How does the proposal in this document relate to the techniques you mention, i.e., L3VPN, 6PE, and Softwire? Does it require them? Is their usage optional for your solution, but helps (and why)? Please make the relationship of your solution to these techniques explicit and state the prerequirements of your solution, if any. "This document proposes a hierarchical and shared forwarding chain organization […]" What is your solution an alternative to? How has information previously been organized? How does the concept of a forwarding chain relate to the details you already gave, which were about a BGP speaker exchanging reachability information and applying path selection - where does the forwarding chain come in? As this appears to be a fundamental concept to your solution, please introduce it in the first paragraph. "incrementally deployed and enabled with zero operator intervention" Well, deplying and enabling any solution does require operator intervention, e.g., a software update, correct? So perhaps that's Zero other operator intervention? Minimal operator intervention? Or not requiring a specific type of operator intervention that would otherwise be needed? Later in Section 3.1, the draft says "It is noteworthy to mention that the forwarding chain is constructed without any operator intervention at all.", so perhaps it's possible to further qualify what kind of operator intervention would otherwise be necessary, but is not necessary with your solution - e.g., no operator intervention is required to reconfigure routes when a link fails 1.1 Terminology Please expand on first usage and consider defining: AFI/SAFI, PE, CE, NLRI, forwarding plane, VPN RD's (probably VPN RDs), LSR, ASBRs, BGP-LU, FIB manager (is this a particular entity? A software component?) You don't have to define all BGP terms that you use, but please expand them once to make it easier to guess what they stand for or to look them up. For "Leaf", "IP leaf", "Label leaf": Why is it called leaf? In graph theory, isn't the leaf of a tree the node with no children and only one parent? In your figures, the "IP leaf" appears to have no parent and instead two children. So isn't it more of a root in the tree? Later, you mention the pathlist being "the parent" of the IP leaf, but in Figure 2, you have an arrow from the IP leaf pointing to the Pathlist, so to me that looks like the Pathlist is the child of the IP leaf. Is this a BGP convention? If so, perhaps a sentence stating that would help, and/or a reference. "OutLabel-List: Each labeled prefix is associated with an OutLabel-List. The OutLabel-List is an array of one or more outgoing labels and/or label actions where each label or label action has 1-to-1 correspondence to a path in the pathlist. Label actions are: push the label, pop the label, swap the incoming label with the label in the Outlabel-Array entry, or don't push anything at all in case of "unlabeled". The prefix may be an IGP or BGP prefix" What are labels/label actions in this context? Are labels the same labels mentioned in the introduction, i.e., local labels that are assigned to prefixes? Are "outgoing labels" still local? Maybe here a brief explanation of how labels are defined and how they work would help. 2. Overview: "A forwarding plane that supports multiple levels of indirection: A forwarding that starts with a destination and ends with an outgoing interface is not a simple flat structure." What is "A forwarding"? Do you mean a forwarding entry? Is this the same thing as a route? Please consider adding a definition to the terminology. Is a forwarding plane the same as a forwarding chain (mentioned in the abstract)? If so, please unify your terminology. If not, please define the terms and explain what the differences are. 2.1.2. Availability of more than one BGP next-hops "The existence of a secondary next-hop is clear for the following reason: a service caring for network availability will require two disjoint network connections hence two BGP next-hops." By "the existence is clear" you mean "The existence is clearly required" or "It is clear whether a secondary next-hop exists" or something else? 2.2 BGP-PIC Illustration "We can see that the BGP pathlist consisting of BGP-NH1 and BGP-NH2 is shared by all NLRIs reachable via ePE1 and ePE2." How can we see that? ePE1 and ePE2 do not show up in Figure 2. I assume they map to something that is shown, but it's not clear what. 3.2. Example: Primary-Backup Path Scenario Comparing Figure 3 to Figure 2, there's a couple of differences in terminology: Figure 2 has an "IP Leaf" and Figure 3 has an "IP prefix leaf" called VPN-IP1. Are "IP Leaf" and "IP prefix leaf" the same concept? If so, please unify your terminology. Same question for VPN-L11 being "OutLabel-List" (Figure 2) and "Label-leaf" (Figure 3), VPN-L21 being part of an "OutLabel-List" (Figure 2) and "BGP OutLabel Array" (Figure 3), and BGP-NH1 being part of a "Pathlist" (Figure 2) and "BGP Pathlist". Figure 3 does not appear to show any Adjacency - why? Figure 2 does not appear to show any label actions - Why? Furthermore, making the figures more similar stylistically (e.g., having "IP prefix leaf" being always underlined or always in brackets) would help for comparing the two figures. 4. Forwarding Behavior "apply the label action of the label on the packet" What does this mean? Does "push" mean that the forwarding engine will add the label to the packet? How will this label be used? Will it be removed from the packet later? Will it be sent in a BGP advertisement? Please make this clearer here, and/or please explain what labels and label actions are earlier, and how they are used. "the forwarding engine applies a hashing algorithm to choose the path and the hashing at the BGP level yields path 0 while the hashing at the IGP level yields path 1" This sounds like ECMP, i.e., there's multiple paths and each packet is hashed and then sent through a path based on the hash. But the earlier sections sounded like your solution was more about primary paths and secondary failover paths. Are these two general approaches and your solution works for either? Please make this explicit, possibly early in the document. 5.1. Flattening the Forwarding Chain "Suppose the platform cannot support the number of hierarchy levels in the forwarding chain. FIB needs to reduce the number of hierarchy levels. […]" When in the process does this flattening happen? Only when a packet is forwarded, like in the above steps, or does it happen when the chain is first constructed? Does the flattening happen after a specific step in the above process, e.g., step 3, or is it independent? If it happens for each forwarded packet, this seems like a lot of steps. How is the overall efficiency still maintained? 6.1. BGP-PIC core "When a remote link or node fails, IGP on the ingress PE receives advertisement indicating a topology change so IGP re-converges to either find a new next-hop and/or outgoing interface or remove the path completely from the IGP prefix used to resolve BGP next-hops." Why IGP, when this document is about BGP? Is implied by the scope "when a core link or node fails but the BGP next-hop remains reachable"? If so, please make this explicit. "As soon as the IGP convergence is complete for the BGP next-hop IGP route, all its BGP depending routes benefit from the new path." What would happen in a scenario where BGP-PIC is not used? Would it take longer until the BGP routes can use the new path, and why? 6.2.2 "the edge node attached to the failed link performs next-hop self" - What does "perform next-hop self" mean? Is there a word missing here, e.g., "lookup"? "The main observation is that the loss of convergence speed due to the loss of hierarchy depth" Does convergence depend of the exchange of BGP messages between BGP peers, or is the concept of convergence defined differently here? It seems like here convergence means something related to how information is stored/updated locally on the router, which is not what I would think about when I read "BGP convergence". (Related to the comment at the beginning of the introduction: What is your problem statement, i.e., what is the type of convergence you are talking about and that your solution speeds up?)) 8. Security Considerations Are you sure that there are no security considerations? For example, if there is a bug in the implementation of this technique, could this make BGP prefix hijacking easier given a specific use of BGP labels? Nits/editorial comments: Abstract: "In the network comprising thousands of iBGP peers" -> "In a network comprising thousands of iBGP peers" Please expand BGP-PIC on first use. 1.1 Terminology "A prefix P/m (of any AFI/SAFI) that is learnt via an Interior Gateway Protocol, such as OSPF and ISIS, has a path for." - Is this sentence missing a subject for the "has a path for"? If this is "A prefix that an IGP has a path for", then the "is learnt via" does not fit in the sentence. "one or more prefix" -> "one or more prefixes" "a IP prefix" -> "an IP prefix" There's a stray ") in the "Pathlist" item. "may not necessarily has" -> "may not necessarily have" "the forwarding engine must visits" -> "the forwarding engine must visit" Please make all your terminology items consistent, i.e., sentences ending with a full stop or not. "A pathlist may contain a mix of primary and backup paths" - why is this its own item? Isn't it about the previous item, "Pathlist", and should be part of the same bullet point item? 2.2.1 Hierarchical Hardware FIB "the number of memory lookup's" -> "the number of memory lookups" 5.1. Flattening the Forwarding Chain Please unify how you write your terms, e.g., "OutLabel-list" vs. "outlabel-list" (Section 5.1) Please unify whether you capitalize all words in your headings or just some.