This is a rtg-dir review of draft-ietf-cats-usecases-requirements-06

Summary:

This document provides the problem statement and the typical scenarios for CATS, which shows the necessity of considering more factors when steering traffic to the appropriate computing resource to better meet the customer expectations. This document also describes CATS requirements.

The document presents good examples and requirements. Comments/Suggestions below, as follows:

Suggestions/Comments:

1- It would be relevant to add a definition of a CATS System.


2-Section 5.1: 
"R1: MUST provide a discovery and resolving method for the mapping of a service identifier to a specific address.
 R2: MUST provide a method to determine the availability of a service instance."


2.1- The way R1 and R2 are currently written does not explicitly guarantee that discovery and availability assessment are dynamic. 


2.2- R1 does not say whether this mapping is done once at boot, periodically updated, or dynamically on request. A static mapping (e.g., DNS or config file) would satisfy this requirement which contradicts CATS goal of reacting to changing conditions.
What about something like: "R1: MUST provide a dynamic discovery and resolution method for mapping a service identifier to one or more current service instance addresses, based on real-time system state." ?


2.3- In R2, "Determine availability" is vague, it could be a one-time health check. It does not guarantee continuous or reactive monitoring, which is essential for dynamic steering.  What about something like: "R2: MUST provide a method to dynamically assess the availability and readiness of service instances, based on up-to-date status metrics (e.g., health, load, reachability)."?


3- Section 5.2:"R3: MUST agree on using metrics.. and their representation among service instances..."


3.1- It is not clear who must agree: the operators? the implementations? at design time or runtime?


3.2- What do you think about adding a requirement regarding freshness indicators and staleness handling for metrics? In my view, when a system "understands" a metric, semantic interoperability goes beyond just knowing what the metric represents and how it is encoded. It also includes understanding how recent or valid the data is.


3.3- "R8: There MUST set up metric information that can be understood by CATS components...." 

3.3.1- The requirement lacks a clear subject. It is unclear who is responsible for setting up the metric information, is it the service instance, the ingress node, a controller, or the operator?

3.3.2- The phrase "set up metric information" is vague. It is not evident what this setup entails: does it refer to configuring metric schemas, encoding data, or runtime registration? Also, this requirement is not verifiable; that is, how would someone test or confirm that the requirement has been met in practice?

3.3.4- What about something like: "All metric information used in CATS MUST be produced and encoded in a format that is understood by participating CATS components. For metrics that CATS components do not understand or support, CATS components will ignore them."?


3.4-"R9: The computing metrics in CATS MUST be simple, that is distributing metrics and selecting path based on these metrics will not cause routing loops and route oscillation."

"simple" is subjective and vague. What about to replace "simple" with the actual intent of the requirement, (loop- and oscillation-freedom)? What about something like: "R9: The computation and use of metrics in CATS MUST be designed to avoid introducing routing loops or path oscillations when metrics are distributed and used for path selection."?


3.5- What about to add a requirement related negotiation or discovery of metric types or capabilities? Perhaps something like: "R#: CATS components SHOULD support a mechanism to advertise or negotiate supported metric types and encodings to ensure compatibility across implementations"?


4- Section 5.3: 


4.1- The draft states: "It has to be determined at what interval or based on what events such information needs to be distributed."
It is unclear who is responsible for making this determination.


4.2- The draft states: "thanks to the comprehensive load...". It is not very clear to me.


4.3-  The draft states: "While existing routing protocols may serve as a baseline for signaling metrics, other means to convey the metrics can equally be considered and even be realized."
It may be helpful to briefly mention some examples of alternative dissemination mechanisms, and to clarify the scenarios where such alternatives may be more appropriate than routing protocols. Likewise, it would be useful to include examples of situations where routing protocols are suitable for metric dissemination,


4.4- In "R11: MUST declare the entity that collect metrics.", what about rephrasing to "R11: MUST specify which entity is responsible for collecting metrics."?


4.5- In "R14: MUST be clear of the update frequency of CATS metrics and its corresponding distribution method." what about to rephrasing to something like: "R14: MUST specify the update frequency of CATS metrics and its corresponding distribution method." 


4.6- The draft states: "Sometimes, a metric that is chosen is not accurate for service instance selection, in such case, a desirable system..."
It would be helpful to include a reference or guidance on how metric accuracy is defined in this context, and how it can be measured or evaluated.


5- Section 5.4:

5.1- The draft states: "The decision logic of the instance selection are subject to the normal packet level communication..." What is normal packet level?


5.2- The draft states: "...the access point might change and successively lead to the same result of the change of service instance..."
"successively lead to the same result of the change of" it is hard to follow, kindly rephrase it.


5.3- The draft states: "If execution changes from one (e.g., virtualized) service instance to another, state/context needs transfer to another."
"needs transfer to another" --> "needs to be transferred to the new instance" ?


5.4- In "R16: Instance affinity MUST be maintained when the transaction is stateful" 
State may persist not only within a single transaction but across a session involving multiple transactions. Therefore, the requirement should possibly refer to both stateful sessions and transactions. Then, what about something like: "R16: CATS systems MUST maintain instance affinity for stateful sessions or transactions"?


5.5- In "R17: Instance affinity MUST be maintained for service requests or transactions that belong to the same flow."
The term "flow" is ambiguous in this context. It is unclear whether it refers to: A transport-layer flow, or an application-layer flow, such as a session, or user interaction. Kindly add a definition in draft-ietf-cats-framework, and reference it here, if applicable. 


5.6- In "R18: MUST avoid keeping fine runtime-state granularity in network nodes for providing instance affinity. For example, as mentioned above, maintaining per-flow states for a specific APP."
What does "fine-granular" mean? How fine is too fine?


5.7- In "R20: SHOULD support the UE and service instance mobility."
It is unclear whether "support" refers to session continuity, seamless handover, detection only, or some other behavior. What about "R20: SHOULD support service continuity in the presence of UE or service instance mobility."?


6- Section 5.5:


6.1- The draft states: "Exposing the information of computing resources to the network may lead to the leakage of computing domain and application privacy."
"the information of computing resources" it is a bit awkward, and unclear what "computing domain privacy" refers to. Kindly add a definition in draft-ietf-cats-framework, and reference it here, if applicable. 


6.2- The draft states: "In order to prevent it, it need to consider the methods to process the sensitive information related to computing domain."
"it need to consider" --> "it is necessary to consider"; "to computing domain"-->"to the computing domain" ?


6.3- The draft states: "At the same time, when anonymity is achieved, it is also necessary to consider whether the computing information exposed in the network can help make full use of traffic steering"
"help make full use of" is ambiguous, What about: "At the same time, when anonymity is achieved, it is important to ensure that the exposed computing information remains sufficient to enable effective traffic steering."?


7- Section 6

7.1- The draft states: "Some security issues need to be considered..." --> "Security issues need to be considered..."?


7.2- In "R22: service data MUST be protected from interception.", "service data" is undefined, is it application-layer data, control plane data, or metrics?


7.3- In "R23: the nature of user's activities SHOULD be hidden as much as possible." 
"hidden as much as possible" is vague and not measurable. What about something like "R23: The nature of a user activities SHOULD be protected to preserve user privacy, including minimizing the exposure of identifying or behavioral patterns." ?


7.4- In "R24: secure advertisements are REQUIRED to prevent rogue nodes from participating in the network"
Secure how? Authenticated? Encrypted?


7.5- In "R25: When making service decisions, the security status of computing resources SHOULD be taken into consideration."
what does "security status" mean? Is it trust level? Threat score?


8. Nits:


8.1- Define PE in the figure 1.


8.2- "specificly" --> "specifically"


8.3- "Qualty of Service" --> "Quality of Service"


8.4- "anonymous methods" --> "anonymization methods"


8.5- "data of services maybe stolen," --> "...data of services may be stolen,..."?


8.6- "round trip network delay(network), which should be bounded to 20-1.5-5.5-7.9 = 5.1ms." --> Round-trip network delay: The remaining latency budget is 5.1 ms, calculated as 20 - 1.5 -7.9 - 5.5 = 5.1 ms.? 


8.7- "Each instance provides equivalent service functionality to their respective clients." --> "Each instance provides equivalent service functionality to its respective clients."


Thanks for this document,

Ines.