I have reviewed this document as part of the Operational directorate's ongoing effort to review all IETF documents being processed by the IESG. These comments were written with the intent of improving the operational aspects of the IETF drafts. Comments that are not addressed in last call may be included in AD reviews during the IESG review. Document editors and WG chairs should treat these comments just like any other last call comments. I've pointed out some "issues" that might not be real issues for people involved in PCE but that could be at least clarified for readers of this document. Detailed comments are provided below. Regards, Lionel ******************** 2. Terminology This document uses the following terms defined in [RFC5440]: PCC, PCE, PCEP Peer, PCEP Speaker. This document uses the following terms defined in [RFC4655]: TED. This document uses the following terms defined in [RFC3031]: LSP. This document uses the following terms defined in [I-D.ietf-pce-stateful-pce-app]: Stateful PCE, Passive Stateful PCE, Active Stateful PCE, Delegation, LSP State Database. [LM] I didn't find a clear definition of the LSP state, except through the definition of the LSP State database in ietf-pce-stateful-pce-app, i.e. The second is the LSP State Database (LSP-DB), in which a PCE stores attributes of all active LSPs in the network, such as their paths through the network, bandwidth/resource usage, switching types and LSP constraints. As this document heavily relies on this LSP state concept, it could be useful to (re-)define it in this document or to find a correct reference for it. 3.2. Objectives The objectives for the protocol extensions to support stateful PCE described in this document are as follows: [...] o Allow a PCC to delegate control of its LSPs to an active stateful PCE such that a given LSP is under the control of a single PCE at any given time. A PCC may revoke this delegation at any time during the lifetime of the LSP. If LSP delegation is revoked while the PCEP session is up, the PCC MUST notify the PCE about the revocation. A PCE may return an LSP delegation at any point during the lifetime of the PCEP session. [LM] I'm assuming that the PCE returning the delegation has also to notify the PCC. If so, maybe: "the If LSP delegation is returned by the PCE while the PCEP session is up, the PCE MUST notify the PCE about the revocation." [LM] the bullet point could be then splitted into two bullets, one for PCC, one for PCE. 5.4. Capability Advertisement During PCEP Initialization Phase, PCEP Speakers (PCE or PCC) advertise their support of stateful PCEP extensions. A PCEP Speaker includes the "Stateful PCE Capability" TLV, described in Section 7.1.1, in the OPEN Object to advertise its support for PCEP stateful extensions. The Stateful Capability TLV includes the 'LSP Update' Flag that indicates whether the PCEP Speaker supports LSP parameter updates. The presence of the Stateful PCE Capability TLV in PCC's OPEN Object indicates that the PCC is willing to send LSP State Reports whenever LSP parameters or operational status changes. The presence of the Stateful PCE Capability TLV in PCE's OPEN message indicates that the PCE is interested in receiving LSP State Reports whenever LSP parameters or operational status changes. [LM] is it "willing/interested for" or just a capability indication? It is not the same thing from a procedure point of view. The PCEP extensions for stateful PCEs MUST NOT be used if one or both PCEP Speakers have not included the Stateful PCE Capability TLV in their respective OPEN message. If the PCEP Speaker on the PCC supports the extensions of this draft but did not advertise this capability, then upon receipt of PCUpd message from the PCE, it MUST generate a PCErr with error-type 19 (Invalid Operation), error-value 2 (Attempted LSP Update Request if the stateful PCE capability was not advertised)(see Section 8.5) and it SHOULD terminate the PCEP session. If the PCEP Speaker on the PCE supports the extensions of this draft but did not advertise this capability, then upon receipt of a PCRpt message from the PCC, it MUST generate a PCErr with error- type 19 (Invalid Operation), error-value 5 (Attempted LSP State Report if active stateful PCE capability was not advertised) (see Section 8.5) and it SHOULD terminate the PCEP session. [LM] why the recommendation for closing the session if an explicit error is sent back? The session could remain open i.e. except stateful PCEP extensions,everything else would be fine. If the session termination is the right thing to do, as we are in a clear error case (as the PCEP speaker should not send the report or the update), the SHOULD should probably be a MUST hee. [LM] Is there any reason to use a specific error in such a case? The PCE could just behave as a PCE not supporting the feature. Why is it important for the supporting PCEP speaker to make the difference? The generic behavior described in RFC5440 is not enough? [LM] active/passive mode are not advertized in PCEP. s/if active stateful PCE capability was not advertised/if stateful PCE capability was not advertised From RFC5440: "6.9. Reception of Unknown Messages A PCEP implementation that receives an unrecognized PCEP message MUST send a PCErr message with Error-value=2 (capability not supported). If a PCC/PCE receives unrecognized messages at a rate equal or greater than MAX-UNKNOWN-MESSAGES unknown message requests per minute, the PCC/PCE MUST send a PCEP CLOSE message with close value="Reception of an unacceptable number of unknown PCEP message". A RECOMMENDED value for MAX-UNKNOWN-MESSAGES is 5. The PCC/PCE MUST close the TCP session and MUST NOT send any further PCEP messages on the PCEP session." [LM] If it is the case, the error handling would be the same between a PCEP speaker supporting the feature but not advertizing it and a PCEP speaker not supporting the feature. And it would not be possible possible to use the specific error responses to infer the capability supported by the other node. Note that even if the update capability has not been advertised, a PCE can still accept LSP Status Reports from a PCC and build and maintain an up to date view of the state of the PCC's LSPs. [LM] I don't undersand. Is it not in contradiction with "If the PCEP Speaker on the PCE supports the extensions of this draft but did not advertise this capability, then upon receipt of a PCRpt message from the PCC, it MUST generate a PCErr with error- type 19 (Invalid Operation), error-value 5 (Attempted LSP State Report if active stateful PCE capability was not advertised) (see Section 8.5) and it SHOULD terminate the PCEP session." Or does it mean that there is another way than PCRpt message for the PCC to send LSP status reports to the PCE? [LM] In this section, it is not described how non-supporting PCEP speaker will hanlded the new TLV. It could be said that unrecognized TLVs will be ignored (as per RFC5440) and OPEN messages will be handled as per RFC5440 (if the comment above is accepted). [LM] Would it be useful to discover (using another TLV) whether the PCE is an active/passive stateful PCE, as in IGP-based capabilities discovery mechanism? 5.5. IGP Extensions for Stateful PCE Capabilities Advertisement When PCCs are LSRs participating in the IGP (OSPF or IS-IS), and PCEs are either LSRs or servers also participating in the IGP, an effective mechanism for PCE discovery within an IGP routing domain consists of utilizing IGP advertisements. [...] Note that while active and passive stateful PCE capabilities may be advertised during discovery, PCEP Speakers that wish to use stateful PCEP MUST negotiate stateful PCEP capabilities during PCEP session setup, as specified in the current document. A PCC MAY initiate stateful PCEP capability negotiation at PCEP session setup even if it did not receive any IGP PCE capability advertisements. [LM] As said above, an as stated in section 5.4: "The PCEP extensions for stateful PCEs MUST NOT be used if one or both PCEP Speakers have not included the Stateful PCE Capability TLV in their respective OPEN message." What is the real added-value of this IGP-based mechanism? Only to be able to identify active PCE/passive PCE mode? 5.6. State Synchronization [LM] "State" could be misleading. "LSP State Sync" would be more approriate/accurate I think. The purpose of State Synchronization is to provide a checkpoint-in- time state replica of a PCC's LSP state in a PCE. State Synchronization is performed immediately after the Initialization phase ([RFC5440]). During State Synchronization, a PCC first takes a snapshot of the state of its LSPs state, then sends the snapshot to a PCE in a sequence of LSP State Reports. Each LSP State Report sent during State Synchronization has the SYNC Flag in the LSP Object set to 1. The set of LSPs for which state is synchronized with a PCE is determined by advertised stateful PCEP capabilities and PCC's local configuration (see more details in Section 9.1). [LM] It seems that : "The set of LSPs for which state is synchronized" should be: "The set of LSPs for which state is to be synchronized" [LM] I don't see how the "advertised stateful PCEP capabilities" have an effect on the set of LSP to synchronize. The capabilities adv is only used to discover stateful PCE capabilities and see if the related PCEP extensions can be used. The identification of the LSPs to synchronize seems to be only based on policies configured in the PCC. The end of synchronization marker is a PCRpt message with the SYNC Flag set to 0 for an LSP Object with PLSP-ID equal to the reserved value 0 (see Section 7.3). In this case, the LSP Object SHOULD NOT include the SYMBOLIC-PATH-NAME TLV and SHOULD include the LSP- IDENTIFIERS TLV with the special value of all zeroes. The PCRpt message MUST include an empty ERO as its intended path and SHOULD NOT include the optional RRO object for its actual path. [LM] What is the reason for the use of "SHOULD" here, instead of "MUST"? If the PCC has no state to synchronize, it SHOULD only send the end of synchronization marker. [LM] Is it when there is no active LSP? If it is, it could be useful to clarify it. And it is maybe not so related to the text just before. It could be clarified in a separate paragraph. A PCE SHOULD NOT send PCUpd messages to a PCC before State Synchronization is complete. A PCC SHOULD NOT send PCReq messages to a PCE before State Synchronization is complete. This is to allow the PCE to get the best possible view of the network before it starts computing new paths. [LM] it is obviously assumed that this requirement is true for PCC/PCE explicitly advertizing support of stateful PCE capability. If the PCC encounters a problem which prevents it from completing the state transfer, it MUST send a PCErr message with error-type 20 (LSP State Synchronization Error) and error-value 5 (indicating an internal PCC error) to the PCE and terminate the session. [LM] s/state transfer/LSP state synchornization 5.7.1. Delegating an LSP A PCC delegates an LSP to a PCE by setting the Delegate flag in LSP State Report to 1. If the PCE does not accept the LSP Delegation, it MUST immediately respond with an empty LSP Update Request which has the Delegate flag set to 0. If the PCE accepts the LSP Delegation, it MUST set the Delegate flag to 1 when it sends an LSP Update Request for the delegated LSP (note that this may occur at a later time). The PCE MAY also immediately acknowledge a delegation by sending an empty LSP Update Request which has the Delegate flag set to 1. [LM] if the delegation is not aceepted by the PCE, I assume that the PCC should not set the Delegate flag in subsequent LSP state report. If it is, this could be clarified somewhere in this section. [..] Note that for an LSP to remain delegated to a PCE, the PCC MUST set the Delegate flag to 1 on each LSP Status Report sent to the PCE. [LM] s/each LSP Status Report/each LSP State Report for this LSP. 5.7.2.1. Explicit Revocation When a PCC decides that a PCE is no longer permitted to modify an LSP, it revokes that LSP's delegation to the PCE. A PCC may revoke an LSP delegation at any time during the LSP's life time. A PCC revoking an LSP delegation MAY immediately clear the LSP state provided by the PCE, but to avoid traffic loss, it SHOULD do so in a make-before-break fashion. [LM] After PCE delegation revocation, is there really a requirement for LSP state clean-up? The revocation is used to remove the authorization of LSP state update to the PCE but I don't see why the PCC would clear LSP state after revocation. The LSP state can be updated using operator-defined default parameters, right? If the PCC has received but not yet acted on PCUpd messages from the PCE for the LSP whose delegation is being revoked, then it SHOULD ignore these PCUpd messages when processing the message queue. All effects of all messages for which processing started before the revocation took place MUST be allowed to complete and the result MUST be given the same treatment as any LSP that had been previously delegated to the PCE (e.g. the state MAY be immediately cleared). [LM] If I correctly understood the text above, after revocation of the PCE delegation, - any PCUpd should be rejected and not ignored - the LSP state(s) that was delegated to the PCE cannot be changed until all the messages in the message queue have been processed. If I'm correct, the text above could be clarified. Any further PCUpd messages from the PCE are handled according to the PCUpd procedures described in this document. [LM] Not understood. Further PCUpd "will result in the PCC sending a PCErr message" as said after the figure. 5.7.2.2. Revocation on Redelegation Timeout When a PCC's PCEP session with a PCE terminates unexpectedly, the PCC MUST wait the time interval specified in Redelegation Timeout Interval before revoking LSP delegations to that PCE and attempting to redelegate LSPs to an alternate PCE. If a PCEP session with the original PCE can be reestablished before the Redelegation Timeout Interval timer expires, LSP delegations to the PCE remain intact. [LM] In case of PCEP session interruption, is there a requirement for the PCE to update delegated LSP states in order to ensure the synchronization between states in the PCE and in the PCC? Likewise, when a PCC's PCEP session with a PCE terminates unexpectedly, the PCC MUST wait for the State Timeout Interval before flushing any LSP state associated with that PCE. [LM] it should be clarified that the "flushing" applies only if there is no other PCE. Otherwise, see section 5.7.4 Note that the State Timeout Interval timer may expire before the PCC has redelegated the LSPs to another PCE, for example if a PCC is not connected to any active stateful PCE or if no connected active stateful PCE accepts the delegation. [LM] Not sure to understand. Is it "if a PCC is not connected to any OTHER active stateful PCE or if no OTHER connected active stateful PCE accepts the delegation? In this case, the PCC MUST flush any LSP state set by the PCE upon expiration of the State Timeout Interval and revert to operator-defined default parameters or behaviors. This operation SHOULD be done in a make-before-break fashion. [LM] I think it is important to make the distinction between PCC with only one PCE and PCC with N PCE. The "Note that" could be put in a separate section. The State Timeout Interval MUST be greater than or equal to the Redelegation Timeout Interval and MAY be set to infinity (meaning that until the PCC specifically takes action to change the parameters set by the PCE, they will remain intact). [LM] reading "the State Timeout Interval timer may expire before the PCC has redelegated the LSPs to another PCE" could be understood as STI timer < RTI timer. And here, it is stated STI timer >= RTI timer. Depending on the clarification on the previous comment, this text could be also clarified (or not) 5.7.3. Returning a Delegation In order to keep a delegation, a PCE MUST set the Delegate flag to 1 on each LSP Update Request sent to the PCC. A PCE that no longer wishes to update an LSP's parameters SHOULD return the LSP delegation back to the PCC by sending an empty LSP Update Request which has the Delegate flag set to 0. If a PCC receives an LSP Update Request with the Delegate flag set to 0 (whether the LSP Update Request is empty or not), it MUST treat this as a delegation return. +-+-+ +-+-+ |PCC| |PCE| +-+-+ +-+-+ | | |---PCRpt, Delegate=1--->| LSP delegated | . | | . | | . | |<--PCUpd, Delegate=0----| Delegation returned | | |---PCRpt, Delegate=0--->| No delegation for LSP | | Figure 6: Returning a Delegation If a PCC cannot delegate an LSP to a PCE (for example, if a PCC is not connected to any active stateful PCE or if no connected active stateful PCE accepts the delegation), the LSP delegation on the PCC will time out within a configurable Redelegation Timeout Interval and the PCC MUST flush any LSP state set by a PCE at the expiration of the State Timeout Interval. [LM] same comment: is it "for example, if a PCC is not connected to any OTHER active stateful PCE or if no OTHER connected active stateful PCE accepts the delegation"? [LM] "the PCC MUST flush any LSP state set" should be completed by "and revert to operator-defined default parameters or behaviors", right? 5.7.5. Redelegation on PCE Failure [...] If the PCE crashes but recovers within the Redelegation Timeout, both the delegation state and the LSP state are kept intact. [LM] "Recover" here seems to be associated with the PCEP session (as linked to the Redelegation Timeout), not with the LSP states maintained by the PCE. [LM] How can the PCC be ensured that the PCE has not been impacted by the stop/restart and that LSP states are intact? Is there any need for a sanity check? If the PCE crashes but does not recover within the Redelegation Timeout, the delegation state is returned to the PCC. If the PCC can redelegate the LSPs to another PCE, and that PCE accepts the delegations, there will be no change in LSP state. If the PCC cannot redelegate the LSPs to another PCE, then upon expiration of the State Timeout Interval, the state set by the PCE is flushed, which may cause change in the LSP state. [LM] the last sentence is difficult to understand. How to understand "flush the state MAY cause change in the LSP state"? It would like talking about two different states in the PCC. [LM] about "flushed", should we add "and revert to operator-defined default parameters or behaviors"? [...] If there is a standby PCE, the Redelegation Timeout may be set to 0 through policy on the PCC, causing the LSPs to be redelegated immediately to the PCC, which can delegate them immediately to the standby PCE. Assuming the State Timeout Interval is greater than the Redelegation Timeout, and assuming the standby PCE takes similar decisions as the failed PCE, the LSP state will be kept intact. [LM] the first "assuming" is not an assumption but a requirement, according to "The State Timeout Interval MUST be greater than or equal to the Redelegation Timeout Interval" 5.8.1. Passive Stateful PCE Path Computation Request/Response +-+-+ +-+-+ |PCC| |PCE| +-+-+ +-+-+ | | 1) Path computation |----- PCReq message --->| request sent to | |2) Path computation PCE | | request received, | | path computed | | |<---- PCRep message ----|3) Computed paths | (Positive reply) | sent to the PCC | (Negative reply) | 4) LSP Status change| | event | | | | 5) LSP Status Report|----- PCRpt message --->| sent to all | . | stateful PCEs | . | | . | 6) Repeat for each |----- PCRpt message --->| LSP status change| | | | Figure 7: Passive Stateful PCE Path Computation Request/Response [LM] in the figure, please use "state" instead of "status" 7.1. OPEN Object This document defines one new optional TLVs for use in the OPEN Object. s/one new optional TLVs/one new optional TLV 7.2. SRP Object The SRP (Stateful PCE Request Parameters) object MUST be carried within PCUpd messages and MAY be carried within PCRpt and PCErr messages. The SRP object is used to correlate between update requests sent by the PCE and the error reports and state reports sent by the PCC. [LM] Should we add the following text at the end of the section? "Optional TLVs may be included within the SRP object body. The specification of such TLVs is outside the scope of this document." 7.3. LSP Object [...] TLVs that may be included in the LSP Object are described in the following sections. [LM] I think that it is possible to add extra optional TLV, not defined in this document. This should be clarified if I'm correct.