I am the assigned Gen-ART reviewer for this draft. The General Area Review Team (Gen-ART) reviews all IETF documents being processed by the IESG for the IETF Chair. Please treat these comments just like any other last call comments. For more information, please see the FAQ at . Document: draft-ietf-dots-rfc8782-bis-05 Reviewer: Dale R. Worley Review Date: 2021-03-22 IETF LC End Date: 2021-03-22 IESG Telechat date: unknown Summary: This draft is on the right track but has open issues, described in the review. I've provided a long list of minor editorial issues, and a short list of technical issues. I suspect that the technical issues have been resolved in the practices of the community and that their apparent status as problems stems from not getting the wording properly aligned with practice. Major issues: The condition of two DOTS mitigation requests overlapping depends on addresses (and alternatives to them) but as defined in section 4.4.1, does NOT depend on port numbers. However, other parts of the text seem to presume that port numbers are involved in testing for overlapping. The correct choice needs to be established and the text made consistent. Does the requesting of a mitigation only withdraw overlapping mitigations that were requested using the same signal channel, or is the effect global? If a mitigation request with trigger-mitigation = false is activated by ending of a signal channel, does reestablishing the channel withdraw it? (Naively I thought it would, but that isn't stated.) If so, how are the former and the current signal channels correlated, given that cuid collisions can prevent them from using the same common identifiers? Indeed, the text does not make it clear how a mitigation that is triggered by the ending of a signal channel can be withdrawn, other than by the expiration of its timer. Minor issues: The 4.09 response is used to report cuid conflicts, but also various other conflicts. Given that cuid conflicts require specific processing, and can happen when other conflicts could also be reported, it seems to me that for cuid conflicts, you want that the response MUST include conflict-information. In section 4.4.1 there is a discussion of a configuration where a client communicates through two different gateways to one same server using a different certificate to communicate with each gateway. The text discusses a configuration where we want the two transaction streams to be treated as one by the client and server. It seems to me that this is an unusual situation which can only succeed if both the client and server have specific configuration for it. As a consequence, the situation doesn't need to be discussed in this document. Conversely, the default result of this topology is that the client and server treat both transactions streams separately (and perhaps neither of them is aware of the overall topology). It seems like this case should work correctly without any special considerations, and so does not need to be documented specifically, either. The overall framework for signal channel configuration is not clear. By default, I assume that the client sets the channel configuration, constrained by the limits on parameter values imposed by the server, and that these values apply to communication in both directions (when applicable). The text in 4.5 and 4.5.1 is consistent with this model. However text in 4.5.2 talks about "agents" changing configuration values, which implies it's possible for the server to set channel configuration. There is discussion in section 4.5.3 of a server sending "a validity time with a configuration it sends", which makes no sense if only the client can change the configuration -- the configuration won't change until the client changes it. Also "the update of the configuration data if a change occurs at the DOTS server side". The model needs to be established, and the text aligned with it. Nits/editorial comments: Global editorial issues: There is a lot of special terminology, and it would help if definitions were gathered in section 2. Additionally, this would help reveal where the text uses undefined synonyms of defined terms, several cases of which I have spotted. There are issues involving "Observe". One is at the start of section 4.4, where the text refers to "subscribe", but that is not the term used in CoAP, indeed CoAP deliberately avoids that term. Also, unless one is familiar with CoAP, one thinks GET has no side-effects, and thus cannot possibly establish a subscription. There are related issues in sections 4.4.2.1 and 4.4.2.2 that left me wondering for which GET requests Observe was mandatory and/or permitted and what values (0 and/or 1) were permitted. I think it would help to start 4.4.2.1 with an overview discussion of the permitted/required uses of Observe in DOTS GET requests. It would help to have adjectives for a mitigation request with trigger-mitigation = false, and for a mitigation request with trigger-mitigation = true. It seems that "deactivating" a mitigation request is used as an undefined synonym of "withdrawing" it, but (on my first two reads), I thought it meant "delete". At this point, I suspect that the words hide complexity which has not been made explicit: the client "requests" a mitigation with trigger-mitigation = false, but the loss of the channel "activates" it. Worse, "activation" causes the actions that are described as being caused by "requesting" a mitigation with trigger-mitigation = true. A description of the states, the transitions between them, and the verbs to describe them should be given, perhaps in section 2. Section 4.4.1 is 16 pages long and really should be cut into a number of subsections. Section 4.4.1 contains two parallel but different definitions/discussions of conflict-information. Not being in a position to print the document, I can't quite make out what is going on, but I suspect some reorganization of the section is in order to replace the two partial definitions with one complete one. (This might be connected with the entries in section 9 and/or section 10.3.) The two parallel definitions are partially excerpted below, and both have the problem that the contextual text says that the response will include "enough information for a DOTS client to recognize ...", but the definition of conflict-information states that it is optional: ----- The response includes enough information for a DOTS client to recognize the source of the conflict as described below in the 'conflict-information' subtree with only the relevant nodes listed: conflict-information: Indicates that a mitigation request is conflicting with another mitigation request. This optional attribute has the following structure: ----- For both 2.01 (Created) and 4.09 (Conflict) responses, the response includes enough information for a DOTS client to recognize the source of the conflict as described below: conflict-information: Indicates that a mitigation request is conflicting with another mitigation request(s) from other DOTS client(s). This optional attribute has the following structure: ----- Detailed editorial issues: (Note that some of these are summarized in a clearer way above.) 1. Introduction The example of Figure 1 is introduced by this paragraph: An example of a network diagram that illustrates a deployment of DOTS agents is shown in Figure 1. In this example, a DOTS server is operating on the access network. A DOTS client is located on the LAN (Local Area Network), while a DOTS gateway is embedded in the CPE (Customer Premises Equipment). But the example also includes a DOTS gateway, and would have been clearer to me if the statement introducing DOTS gateways was made before the start of the example rather than after it: The DOTS client can communicate directly with a DOTS server or indirectly via a DOTS gateway. 3. Design Overview support for asynchronous Non-confirmable messaging It might be worth noting here or in section 2 that "Non-confirmable" (and "Confirmable") are CoAP technical terms. Absent such mutual agreement, the DOTS signal channel MUST run over port number 4646 as defined in Section 10.1, for both UDP and TCP. It might be worth stating this port number is for both the client and the server to use (or that 4646 is just the listening port for servers). Also, the DOTS server may rely on the signal channel session loss to trigger mitigation for preconfigured mitigation requests (if any). This doesn't carry quite the right idea. What is really going on is that the DOTS client may configure mitigation requests that will be automatically acted upon by the server if the signal channel session is lost. This is a required facility of the server, but it may be relied upon by the client. DOTS signaling can happen with DTLS over UDP and TLS over TCP. s/can happen/can use/ or perhaps "can happen over". In deployments where multiple DOTS clients are enabled in a network (owned and operated by the same entity) ... I think you want something like "In deployments with multiple DOTS clients in a single network and administrative domain ...". o Port Control Protocol (PCP) [RFC6887] or Session Traversal Utilities for NAT (STUN) [RFC8489] may be used to retrieve the external addresses/prefixes and/or port numbers. Would be clearer if it is "may be used by the client to retrieve ...", as the preceding paragraph is about the translator and here we are talking about the client without explicitly mentioning it. 4.4. DOTS Mitigation Methods GET: DOTS clients may use the GET method to subscribe to DOTS server status messages or to retrieve the list of its mitigations maintained by a DOTS server (Section 4.4.2). Unless one is aware of the "Observe" option of CoAP, using GET to establish a subscription seems impossible, as it is a side-effect. The reader could be warned by wording like: GET: DOTS clients may use the GET method to retrieve the list of its mitigations maintained by a DOTS server (Section 4.4.2), or (using the CoAP Observe option [RFC7641]) to subscribe to DOTS server status messages. -- Mitigation requests MUST NOT be delayed because of checks on probing rate (Section 4.7 of [RFC7252]). How does this sentence connect with the preceding sentences of the paragraph? Also, what does "probing" refer to? I suspect you mean that mitigation requests can be Non-confirmable and would by default fall under the rules of the preceding sentences, but you don't want that. So the sentence could be clarified as "However, mitigation requests MUST NOT be delayed by these limitations." 4.4.1. Request Mitigation with the trailing "=" removed from the encoding Should be 'the trailing two "="', 'the trailing "="s', or similar, since the base64 encoding of a string of 16 bytes will always end in two "=". DOTS servers MUST return 4.09 (Conflict) error code to a DOTS peer to notify that the 'cuid' is already in use by another DOTS client. The error code 4.09 has other defined uses in the signal channel. Given the special and "global" action needed based on this error code, there must be an unambiguous way for the client to identify cuid collision. Unfortunately, there is no "session initiation handshake" message for which a 4.09 response would be unambiguous. It seems like the best choice is to look for conflict-information in the response, since it has a conflict-cause value "CUID Collision". But conflict-information is optional. I recommend making conflict-information mandatory in this situation. However, see my comments at the end of the section regarding the lack of clarity whether conflict-information is mandatory or optional. If the 'mid' value has reached 3/4 of (2^(32) - 1) (i.e., 3221225471) and no attack is detected, the DOTS client MUST reset 'mid' to 0 to handle 'mid' rollover. It sounds like, but does not say explicitly, that mid rollover automatically invalidates any active high-mid mitigation request, and thus, if the client wants to maintain any existing request, it must recreate them (necessarily with small mid values). This needs to be clarified. The default value of the parameter is 'true' (that is, the mitigation starts immediately). If 'trigger-mitigation' is not present in a request, this is equivalent to receiving a request with 'trigger-mitigation' set to 'true'. The second sentence is completely redundant, but I suspect that a practical need for it has been discovered. ... or the 'cuid' was generated from a rogue DOTS client. Probably s/from/by/. But it seems that there is a valid situation where duplicate cuids are plausible, when two DOTS clients are using the same certificate to peer with a server because that certificate is what the server administrator provided to peer with the server. I don't know if that is worth mentioning here, though. If a DOTS client is provisioned, for example, with distinct certificates as a function of the peer server-domain DOTS gateway, distinct 'cdid' values may be supplied by a server- domain DOTS gateway. The ultimate DOTS server MUST treat those 'cdid' values as equivalent. I'm having a hard time following this, probably because I am not familiar with the language used to describe these situations. I think it means If a DOTS client is provisioned, for example, with distinct certificates to use to peer with distinct server-domain DOTS gateways that peer to the same DOTS server, distinct 'cdid' values may be supplied by the gateways to the server. The ultimate DOTS server MUST treat those 'cdid' values as equivalent. The final normative statement is clear, but it isn't clear to me how the server can implement that, unless it is provisioned with the knowledge that the two certificates are used by the same client. More subtly, if the server must treat them as equivalent, dependencies between transactions in one transaction stream apply to the union of the transaction streams through the two servers. E.g. the rule that mid is nearly-monotonic and the consequences thereof. Handling this correctly requires that the client knows that transactions through the two gateways will be handled equivalently by one same server, and that seems to require that the client also be configured with particular knowledge. It seems to me that there are actually two cases (1) a "dumb" case where the client happens to access the same server through two gateways, but neither the client nor the server knows that. In that case, the signal channel protocol "just works" normally. (2) a "smart" case where both the client and serve must know that access through the two gateways is considered equivalent (but the gateways do not need to know). In that case, as long as both the client and server agree on this equivalence, the signal channel protocol also "just works". It's not clear that it is necessary to document here the "smart" case, as the needed adjustments are logically determined by the intended use case. If it is not needed, the quoted paragraph is probably best omitted, because trying to implement it generally would tend to cause the "dumb" case to fail. If the mitigation request contains the 'alias-name' and other parameters identifying the target resources (such as 'target-prefix', 'target-port-range', 'target- fqdn', or 'target-uri'), the DOTS server appends the parameter values in 'alias-name' with the corresponding parameter values in 'target- prefix', 'target-port-range', 'target-fqdn', or 'target-uri'. This sentence is not connected with any other processing -- what use is the concatenated value put to? Also, the processing described will NOT be done if alias-name is not present, suggesting that in some way it is optional. Also, the phrase "the parameter values in 'alias-name'" is undefined, as alias-name is an opaque string value. I suspect that some aspect of the processing has not been described. Perhaps the meaning is that an alias is always configured as a set of values for the other parameters, and that if a request contains both an alias name and other parameters, the effective request is formed by merging the two sets of parameter values. Though if that is meant, some provision must be made for the situation where the alias gives a value for a parameter that is contradicted by an explicit parameter in the request. If the DOTS server does not find the 'mid' parameter value conveyed in the PUT request in its configuration data [it may interpret it in a certain way] It's not clear what is going on here, as "mid=..." is a mandatory part of the Uri-Path, and any such request must be rejected. A DOTS server could reject mitigation requests when it is near capacity or needs to rate-limit a particular client, for example. This should be a separate paragraph, as it applies more broadly than the conditions of the first sentence of the paragraph. Also, it probably merits s/could/MAY/. Two mitigation requests from a DOTS client have overlapping scopes if there is a common IP address, IP prefix, FQDN, URI, or alias. Probably worth stating explicitly that a common port number is NOT a factor in determining overlapping scopes. If the DOTS server receives a mitigation request that overlaps with an active mitigation request, but both have distinct 'trigger- mitigation' types, the DOTS server SHOULD deactivate (absent explicit policy/configuration otherwise) the mitigation request with 'trigger- mitigation' set to 'false'. I'm pretty sure I don't know what this means. What does "deactivate" mean? The first time I read it, I thought it meant "delete", The second time, I suspected it meant the opposite action of "activate", which is what happens to a trigger-mitigation = false mitigation when the signal channel is lost. The third time, I was wondering why the reestablishment of the signal channel didn't automatically cause the trigger-mitigation = false mitigation to be deactivated. conflict-scope: Characterizes the exact conflict scope. It may include a list of IP addresses, a list of prefixes, a list of port numbers, a list of target protocols, a list of FQDNs, a list of URIs, a list of aliases, or references to conflicting ACLs (by an 'acl-name', typically [RFC8783]). Note this text includes a "list of port numbers", but port numbers are not a factor in conflicts. Also, is it really intended that this parameter is, effectively, only human-readable, since there is no particular way to specify what type of datum the value contains? 4.4.2. Retrieve Information Related to a Mitigation +-----------+----------------------------------------------------+ | 4 | Attack has exceeded the mitigation provider | | | capability. | +-----------+----------------------------------------------------+ "mitigation provider" is used in a few places but it appears that the intended term is "mitigator". +-----------+----------------------------------------------------+ | 6 | Attack mitigation is now terminated. | +-----------+----------------------------------------------------+ It seems like code 6 includes codes 5 and 7. Is this ambiguity intended? I suspect the text that is actually wanted is "DOTS client has withdrawn the mitigation request and the attack mitigation is now terminated." There is a parallel issue in section 10.6.2. 4.4.2.1. DOTS Servers Sending Mitigation Status DOTS implementations MUST use the Observe Option for both 'mitigate' and 'config' (Section 4.2). It's not clear what "MUST use the Observe Option" means. Does it mean that clients MUST use it in GET requests for 'mitigate' and 'config'? If so, is the client allowed to use "Observe: 1", despite that this section only discusses the "Observe: 0" case? Or does it just mean that servers must implement it, and thus respond correctly if a client sends it? 4.4.2.2. DOTS Clients Polling for Mitigation Status In such case, the DOTS client recalls the mitigation request by issuing a DELETE request for this mitigation request (Section 4.4.4). The term "recall" is used in a few places but it seems like the correct term is "withdraw" (section 4.4.4). 4.4.3. Efficacy Update from DOTS Clients In what way is an "efficacy update" different from an "update"? Can "efficacy" be removed without loss, or is it a term of art for updates to mitigation requests sent during attacks? It appears that an update is an "efficacy update" if and only if "attack-status" is present. This should be stated at the beginning of the section, as otherwise it's a mystery what distinguishes "efficacy updates". 4.4.4. Withdraw a Mitigation Once the request is validated, the DOTS server immediately acknowledges a DOTS client's request to withdraw the DOTS signal using 2.02 (Deleted) Response Code with no response payload. s/DOTS signal/DOTS mitigation request/ 4.5. DOTS Signal Channel Session Configuration d. Acceptable signal loss ratio: Maximum retransmissions, retransmission timeout value, and other message transmission parameters for Confirmable messages over the DOTS signal channel. What are the names of these parameters in the signal-config structure? As such, the transmission-related parameters ('missing-hb-allowed' and acceptable signal loss ratio) are negotiated only for DOTS over unreliable transports. It seems this could be said more clearly by listing the permitted fields: "only the 'heartbeat-interval' parameter [or whatever] is negotiated for DOTS over reliable transports". 4.5.1. Discover Configuration Parameters At least one of the attributes 'heartbeat-interval', 'missing-hb- allowed', 'probing-rate', 'max-retransmit', 'ack-timeout', and 'ack- random-factor' MUST be present in the PUT request. Note that 'heartbeat-interval', 'missing-hb-allowed', 'probing-rate', 'max- retransmit', 'ack-timeout', and 'ack-random-factor', if present, do not need to be provided for both 'mitigating-config', and 'idle- config' in a PUT request. Must both the mitigating and idle configuration sections be present in the PUT? Does the requirement "At least one..." apply to both sections together or each section alone? If e.g. missing-hb-allowed is present in one section but not the other, the wording gives a vague suggestion that the same value is implicitly provided for the other section. Is this true? The PUT request with a higher numeric 'sid' value overrides the DOTS signal channel session configuration data installed by a PUT request with a lower numeric 'sid' value. To avoid maintaining a long list of 'sid' requests from a DOTS client, the lower numeric 'sid' MUST be automatically deleted and no longer available at the DOTS server. Does this mean that the PUT with the higher sid installs what values it provides on top of the current configuration, or does it mean that the previous PUT's effect is entirely removed, that is, parameters not given in the higher-sid PUT take their default values? Note that the latter is resistant to problems from lost PUT requests but the former is not. o If the DOTS server finds the 'sid' parameter value conveyed in the PUT request in its configuration data and if the DOTS server has accepted the updated configuration parameters, 2.04 (Changed) MUST be returned in the response. Given the earlier statement "'sid' values MUST increase monotonically (when a new PUT is generated by a DOTS client to convey the configuration parameters for the signal channel).", if a server receives a PUT with the same sid as a previous PUT then the client is misbehaving and the server should send an error response. A DOTS client may issue a GET message with a 'sid' Uri-Path parameter to retrieve the negotiated configuration. Does this sid value matter, or is only its presence important? Also, you probably want to expand this to "a GET message for 'config' with a 'sid' Uri-Path parameter ...". 4.5.3. Configuration Freshness and Notifications The underlying processing is not made clear. Roughly, it seems that the idea is the server has the right to change the configuration unilaterally at any time, but if the client does a GET of the configuration, the server is required to commit that it won't change the configuration given in the response within Max-Age Option seconds. Or is this talking about a mechanism where the server can, at its initiative, tell the client how the client should behave? Which is completely different from section 4.5.2 where the client tells the server how to behave. 4.5.4. Delete DOTS Signal Channel Session Configuration Upon bootstrapping or reboot, a DOTS client MAY send a DELETE request to set the configuration parameters to default values. Such a request does not include any 'sid'. I would take it as assumed that when the (D)TLS connection is established, that is, when the DOTS signal channel session is initiated, it has the default configuration parameters. Thus the DELETE described here is guaranteed to have no effect. But perhaps the intention is that the signal channel is conceptualized as persisting longer than the (D)TLS connection, and (perhaps) associated with the cuid/cdid value. If so, that should be stated clearly. 4.6. Redirected Signaling If a DOTS server wants to redirect a DOTS client to an alternative DOTS server for a signal session, then the Response Code 5.03 (Service Unavailable) will be returned in the response to the DOTS client. What is "the response"? It seems that this is only sensible if the session is just being established, but there doesn't seem to be a specific session-initiation message. If you really mean that the server can redirect the session in response to any request, it would be helpful to state that directly. Also, you need to specify whether the connection to the alternate server is a new session (with independent state) or whether it is expected to be a continuation of the existing session (carrying the same state). 4.7. Heartbeat Mechanism For example, if a DOTS client receives a 2.04 response for its heartbeat messages but no server-initiated heartbeat messages, the DOTS client sets 'peer-hb-status' to 'false'. The DOTS server then will ... There is a lot of detail left out here, as there are messages and events involved that are not mentioned explicitly. I think what is meant is "For example, if a DOTS client receives a 2.04 response for its heartbeat messages but no server-initiated heartbeat messages, the DOTS client sets 'peer-hb-status' to 'false' in its next heartbeat message. Upon receiving that message, the DOTS server then will ..." It might be useful to explicitly state that the bodies of responses to heartbeat requests are empty. 6. YANG/JSON Mapping Parameters to CBOR It might help the implementors to tell whether this is the same as section 6 of RFC 8782 or not. 10.1. DOTS Signal Channel UDP and TCP Port Number IANA has assigned the port number 4646 (the ASCII decimal value for ".." (DOTS)) ... Ow! [END]