Hi all, I have reviewed this document as part of the security directorate's ongoing effort to review all IETF documents being processed by the IESG. These comments were written primarily for the benefit of the security area directors. Document editors and WG chairs should treat these comments just like any other last call comments. >From a security perspective, this document is Ready. It specifies a standard scheme that can be used to back off SPF calculations during periods of frequent IGP events, avoiding excessive resource consumption performing calculations that would be rendered redundant (or just be useless) soon. The security considerations correctly note that an attacker that can generate IGP events would be able to delay the IGP convergence time, which is true both for this scheme and all schemes previously in use. (I might use more words to say the same thing if I was writing it, but that probably reflects more on me than the document.) I do have some questions about the actual proposed FSM, though -- I suspect that I am just making some implicit assumptions that may not be grounded in reality. In particular, I am basically assuming that INITIAL_SPF_DELAY < SHORT_SPF_DELAY < LONG_SPF_DELAY < HOLDDOWN_INTERVAL. (The draft itself only has it as RECOMMENDED for SPF_INITIAL_DELAY <= SPF_SHORT_DELAY <= SPF_LONG_DELAY in Section 6, yes, with the different spellings, but has a MUST for HOLDDOWN_INTERVAL > TIME_TO_LEARN_INTERVAL.) This would potentially affect the state machine for events 1, 6, and 7. In transition 1, we say to only start the SPF_TIMER if it is not already running, but I do not see a way for it to already be running unless the HOLDDOWN_TIMER value is less than one or more of the SPF_TIMER values. Similarly, I don't see how transition 7 could ever happen, since an IGP event moves us out of the QUIET state, and I assume that the SPF_TIMER would fire before the HOLDDOWN_TIMER, since the latter is reset on every IGP event and I assume the latter has a larger value. Transition 6 is a little less clear, but is also similar -- if HOLDDOWN_TIMER is larger than LEARN_TIMER, then LEARN_TIMER must fire before HOLDDOWN_TIMER, and we leave SHORT_WAIT to go to LONG_WAIT before we could consider leaving SHORT_WAIT to go back to QUIET. So, am I making some flawed assumptions? (Are there examples of situations that clearly demonstrate the flaw?) Also, to confirm my understanding, suppose a scenario happens where an IGP participant sees an event, then a gap of 100ms, then three IGP events at equal 10ms intervals, with the SPF delays at the example values of 0/50/2000 ms and TIME_TO_LEARN of 1s. The first event triggers a SPF computation immediately, then we go to SHORT_WAIT, the first of the three events kicks off a SPF_TIMER for 50ms, which is reset by the next two events, the SPF timer fires and we recompute the SPF, then TIME_TO_LEARN fires and we go to LONG_WAIT until the HOLDDOWN_TIMER fires. Or maybe the SPF calculation takes more than 200ms, so when the second IGP event fires, we abort the currently in-progress calculation and don't start another one until 50ms after the last event? I bring this up because of the text in the second paragraph of Section 4 that talks of computing the post-failure routing table in "a single route computation". But if I understand correctly, the *single* computation only happens in the second case here, when the calculation takes some hundreds of milliseconds; otherwise we still have *two* computations (one triggered while we're in QUIET and the second triggered in SHORT_DELAY). So I'm not sure I fully understand the expected scenario. I also am probably having some problems with terminology, presumably just my misunderstanding, which hopefully can be set straight easily. In the Introduction, we have a "desire to compute a new Shortest Path First (SPF) as soon as a failure is detected", which is using SPF as it is a data structure (e.g., the result of an algorithm), whereas my intuition has SPF referring to the algorithm [class] but not its output. In section 3, we talk of "computation of the routing table, by the IGP", which gets me confused about whether "the IGP" represents a network protocol for conveying (e.g.) link state information, an algorithm for SPF computation, or a router that performs SPF computations. In section 6 we talk of "the number of protocols reactions/computations triggered by IGP SPF". Is this just in the sense of "each SPF calculation triggers a bunch of other stuff"? I think this is another case about me being confused whether "SPF" means an algorithm, a specific computation using that algorithm, etc. Some other editorial notes: It's probably better to cite RFC 8174 instead of/in addition to RFC 2119, especially since there is at least a lowercase "may" present. It's unclear that "temporally close" in "multiple temporally close failures over a short time" really adds any value, in the Introduction. In section 2, last bullet point on page 3, "SPF_DELAY timers values" probably doesn't need the plural "timers" (so, either "timer" or the possessive "timers'"), though I am mindful of the recent discussion on ietf@ about (non-)American English. The second sentence of the bullet is also a sentence fragment and not a complete sentence. SRLG is used without expansion in multiple places, but does not appear on https://www.rfc-editor.org/materials/abbrev.expansion.txt as a "well-known" abbreviation. In section 6, we find the awkward construction "play it safe and start with safe, i.e., longer timers". Probably we want to say "safe values" as the noun, and maybe consider rewording to avoid the duplicate "safe" and/or the colloquialism "play it safe". Section 8 says: [...]. FIBs are installed after multiple steps such as flooding of the IGP event across the network, SPF wait time, SPF computation, FIB distribution across line cards, and FIB update. This document only addresses the first contribution. which makes me try to match up "the first contribution" with the flooding, when I assume it's meant to match up with the SPF wait time. -Benjamin