# secdir review of draft-ietf-emu-eap-arpa-06
CC kaduk

I have reviewed this document as part of the security directorate's
ongoing effort to review all IETF documents being processed by the
IESG. These comments were written primarily for the benefit of the
security area directors. Document editors and WG chairs should treat
these comments just like any other last call comments.

The summary of the review is almost ready -- the general idea of defining
predefined identifiers under eap.arpa to signal a type of provisioning EAP
access request is sound and, in hindsight, long overdue.  Many of my most
significant comments will be probing at the boundaries of what we expect
future implementations/documents to do, and the statements we make about
existing implementations and deployments.

That said, if I was still on the IESG I would ballot DISCUSS due to a few
specific points that might impact the current and/or future interoperability
of the protocol, and some internal-consistency issues.

I also made a github PR (https://github.com/FreeRADIUS/eap-arpa/pull/2) with
some editorial suggestions for things I noticed while reviewing that
(probably) do not need discussion here.

## Discuss

### (Non-)Permanence of domain registrations

Section 6.6 (and others) describes self-assignment of identifiers under the
"v." subdomain, with an organization being able to use a FQDN they have
registered as the domain prefix.  But such domain registrations are not
permanent, and implementations using such names in software may persist after
the registration has lapsed.  I think we should have some text in the document
discussing this mismatch in timescales, which might entail guidance to domain
owners to ensure they keep the domain registered or some guidance to
implementors/users that such self-registrations may become stale if the domain
ownership changes (or some other solution, of course).  (For example, the
claim in §3.2 that such self-assigned identifiers "cannot conflict with other
identifiers" is not true if the domain name used to construct the identifiers
gets reassigned.)

### authenticate the server or not

It looks like there's some internal inconsistency in what we expect to happen
for using EAP-TLS for provisioning w.r.t. server authentication.  In toplevel
§5 we say that EAP-TLS has the advantage of authenticating the EAP server, but
in §5.1 we say that the device "SHOULD ignore" the server certificate (but
that the device likely has web CAs present and could use those to authenticate
the EAP server).  Is there some subtlety I'm missing that makes these cases
different?  If not, it seems like we need to have a consistent message on what
EAP-TLS for provisioning is supposed to provide (and if there is a subtle
distinction, we should call it out clearly).

If we do end up keeping the statement that peers could use web CAs to
authenticate the EAP server, I would strongly recommend providing some
commentary about when it would or would not be a good idea to actually do so,
or what factors would come into play in deciding whether or not to do so.

### Does TLS-PSK need to be handled separately from regular EAP-TLS?

The final paragraph of §5.1 mentions that TLS-PSK can technically be used with
EAP-TLS for provisioning purposes, but in all the TLS stacks I know of, using
TLS-PSK is effectively a distinct operation than doing a certificate-based
handshake, and I would not generally expect either peers or servers to be
prepared to handle both for the same TLS connection (i.e., letting the other
endpoint pick which to use).  To me, that suggests that interoperability would
benefit from defining a distinct provisioning NAI to indicate that TLS-PSK
should be used with EAP-TLS, leaving portal@tls.eap.arpa for certificate-based
(server) authentication.  Do we have reason to believe that the current
specification will be interoperable in the face of peers/servers that do and
do not want to attempt TLS-PSK "authentication"?

I would probably also say something to clarify that the (lowercase!) raw ASCII
byte string of the NAI name is used directly as the PSK, without other
processing, but that's just at a comment level.

### NAIs for TLS-based EAP methods

The rules for the registry seem to say that there must be a 1:1 correspondence
(or at least N:1) between provisioning NAI and EAP method.  So I'm really
confused at why we have any discussion of TTLS and PEAP (in §5.2) but say to
use the same NAI (portal@tls.eap.arpa) as for EAP-TLS.  Why do we not need to
define distinct NAIs to provide the semantics indicated here?

If the intent is to explicitly not define such NAIs to align with our
recommendation to use EAP-TLS in preference to other TLS-based EAP methods,
then I think we need a clear disclaimer that portal@tls.eap.arpa MUST NOT be
used for those methods.

## Comments

### division of responsibility between this doc and provisioning methods

In §5 we have some discussion about how our predefined provisioning NAIs will
interact with existing EAP types, including a statement that where TLS-based
methods have inner identity/authentication, those credentials "MUST be the
provisioning identifier", among other requirements.  I'm not sure I understand
why we need to tie our hands so strongly in this document, when any given
provisioning identifier is going to be specific to a single EAP method (per
§6.2 and 3.4.1).  Why is it necessary for the core protocol framework
specifically to impose this requirement, vs the individual provisioning
methods doing so (with guidance from the framework as a useful default)?

I do see that the registration procedure is merely "expert review" so there
may not always be a document that would be able to hold such a requirement.
But it seems like we could say "unless otherwise specified, assume that the
password is the provisioning identifier" and leave room for future evolution.

### TEAP

There's a lot of mention of TEAP in this doc, including using teap.eap.arpa as
an example NAI realm, and discussion of using in-EAP provisioning via TEAP.
But this document does not actually specify/register any teap.eap.arpa NAIs.
Why not?  I see that there's an rfc7170bis in progress, but the current -21
does not contain the string "arpa".

### terminology

My reading of RFC 3748 is that we should prefer "peer" over the IEEE 802.1X
"supplicant", but "supplicant" appears a few times.  Please check whether or
not those uses should be changed to "peer" for consistency.

### table formatting

The prose in Section 6.2.1 suggests that there is a "table" but neither the
TXT nor HTML versions is rendering as such for me; I think either the
doc-generation toolchain or the prose needs an update to become consistent.
The HTML version in particular does not even have a line separator between
what is two different lines in the TXT version (which I infer is intended to
be a row separator in the table).

### PIE is tasty but perhaps out of scope

While I appreciate the levity in "[t]he choice of "Provisioning Identifiers
for EAP" (PIE) was considered and rejected", it feels more suited to an I-D
than a final RFC; please consider dropping that sentence.

### Concepts vs protocol

It seems to me that there are large swathes of Section 3 that have grown past
just describing "concepts" to going into substantial detail on protocol
operation, making the arrival of Section 4 as an "overview" a bit of a
surprise.  It is even more of a surprise to see that the "overview" is just a
review of existing functionality and a rationale for the class of approach
taken, while saying essentially nothing about how the actual protocol works.
While it is perhaps a bit late to propose a drastic reshuffling of
content, perhaps retitling the two sections would still be useful.

### not routing .arpa for AAA

Section 3 notes:

> The realm is one which should not be automatically proxied by any Authentication, Authorization, and Accounting (AAA) proxy framework as defined in [RFC7542], Section 3.

I think it would be helpful to be more clear about what we mean by "should
not", here -- are we making an interpretation of the requirements already
present in RFC 7542, an interpretation of the preexisting rules around use of
.arpa (I did not try to pull the sources for where that is specified), a
statement based on knowledge of current implementation behavior, or something
else?  I do see there is a bit more discussion in §3.7, but there's no forward
reference from here so this text should either gain such a reference or do
more to stand on its own.

Similarly, when we say:

> The realm is also one which will not return results for [RFC7585] dynamic discovery.

I assume that we are assuming that there are no S-NAPTR records in the .arpa
zone at all, and that's the basis for the claim.  It seems helpful to the
reader to include our reasoning in this instance as well.

### Enumerating implementations

In Section 3 we say:

> We note that this specification is fully compatible with all existing EAP implementations, so it is fail-safe.

which is making a statement about "all existing EAP implementations".  While I
have pretty high confidence in the statement, it remains impossible for us to
prove the absence of some private EAP implementation that is incompatible with
this specificiation.  So we probably want to hedge a bit about "known
implementations" or point to a list of them or something like that.

(We could in theory  also go into more detail on what exactly we mean about
"compatible" in terms of existing servers behaving as expected in the face of
updated peers, and existing peers not doing anything that would trigger the
new functionality in upgraded servers, but I don't actually think that would
add real value in this case and so do not recommend it.)

### Coordinating method type names and subdomain names

In §3.2

> Where it is not possible to make a direct mapping between the EAP Method Type name (e.g. "TEAP" for the Tunneled EAP method), and a subdomain (e.g. "teap.eap.arpa"), the name used in the realm registry SHOULD be similar enough to allow the average reader to understand which EAP Method Type is being used.

There's a (probably theoretical) risk of an EAP Method Type that's not a
valid domain name being translated to a name, call it foo, and then some
future EAP Method Type being created that's named "foo" as well, so the
preferred mapping is no longer possible.  We could probably avert that by
updating the Method Types registry to have a note to not register such
conflicting names, though I'm not entirely convinced that it's worth the
effort to do so, since new method types are pretty rare and Joe (the DE) would
probably flag it anyway.

### Anonymous not recommended

Can we say something in §3.3 about why we say that a username of "anonymous"
is "NOT RECOMMENDED"?

### Direct configuration of NAI

In §3.4.1 we say:

> EAP peers MUST NOT allow these NAIs to be configured directly by end users. Instead the user (or some other process) chooses a provisioning method, and the peer then chooses a predefined NAI which matches that provisioning method.

I agree with the goal here, but are there or could there be existing
situations where implementations already allow the user to directly enter the
NAI (along with the associated credentials)?  If so, we probably want some
discussion about what might happen if a user (maliciously?) enters a
predefined NAI in such a way, along with guidance that implementations that do
allow this behavior need to check for eap.arpa entries and reject them.

### re-authentication process

In §3.4.1:

> When all goes well, running EAP with the provisioning NAI results in new authentication credentials being provisioned. The peer then drops its network connection, and re-authenticates using the newly provisioned credentials.

Do we expect any user involvement in this drop+reauthenticate scenario?  Is
the user supposed to have access to/knowledge of credentials that are
provisioned?

### Allow for server upgrades

In §3.4.1:

> There are a number of ways in which provisioning can fail. One way is when the server does not implement the provisioning method. EAP peers therefore MUST track which provisioning methods have been tried, and not repeat the same method to the same EAP server when receiving a an EAP Nak. EAP peers MUST rate limit attempts at provisioning, in order to avoid overloading the server.

We may want to saay something about the not repeating being bound to some
large-ish but not-infinite timeframe, to allow for another attempt much later
to succeed if the server has been upgraded in the interim.  (We also don't
want requirements on peers to have unbounded local storage requirements!)

(We could also give some guidance on what good rate limiting might look like,
even if that takes the form of factors to consider rather than specific
values.  Note that rate limiting also comes up in §3.4.2.)

### Large amounts of data and PQC

In §3.4.2:

> A limited network SHOULD also limit the amount of data being transferred by devices being provisioned, and SHOULD limit the network services which are available to those devices. The provisioning process generally does not need to download large amounts of data, and similarly does not need access to a large number of services.

Do you have a sense for what people might take "large amounts of data" to
mean?  As we start transitioning to post-quantum cryptography with its larger
key sizes, it would be unfortunate if the total data limit for provisioning
was too small to admit transfer of credentials using PQC algorithms (but I'm
not sure if we actually need so say something, if the limits in practice will
be fine without us doing so).
(There is some related discussion in §5.1 that might want a section reference
back to any new content added here.)

### EST, ACME, and CMP

Section 3.6.2 uses EST and ACME as examples of provisioning protocols, but
ACME was a bit surprising for me to see there, since it is most often used for
TLS server certificates and where the entity getting a certificate has a DNS
name for it, which does not seem like it would generally be the case for an
EAP peer.  I would find something like CMP (RFC 4210) more analogous to EST as
a good example to use here.

### More on AAA Routability

In addition to saying that administrators "will not have statically configured
AAA proxy routes for this domain [at the time of this writing]" do we want to
say anything about "there is generally no reason for administrators to add
such proxy routes, and if they do it would be in service of using this
specification"?

### TEAP details

The final paragraph of §4.1 discusses TEAP, but manages to leave enough unsaid
that it's hard to discern what point we're trying to make by mentioning it.
For example: (1) are we trying to contrast the "server unauthenticated
provisioning mode" (presumably for the outer tunnel" with the "inner TLS
exchange requires that both end [sic] authenticate each other" apparent
requirement that the server can in fact authenticate itself, or to highlight
that the peer still needs some credentials for that inner tunnel? (2) The
final sentence seems to contrast "ways to provision a certificate" with a need
to have preexisting credentials, but the apparent conclusion that the ways to
provision a certificate are not very useful is left unsaid.

I would recommend fleshing out this discussion a bit more to make the message
more readily apparent.

### Rationale

We have §4.2.1 to give a rationale for provisioning inside EAP, but no
corresponding section with a rationale for provisioning inside a captive
portal, yet we do not specifically recommend provisioning inside EAP.  This
leaves me unsure what the purpose of the section is, if we're going to spend
time justifying something that's just one option to choose from with no other
special status.  (I can infer that using a captive portal facilitiates reuse
of existing provisioning protocols and/or deployments, but the document
doesn't tell me that.)

### EAP-TLS clarifications

The final sentence of toplevel §5 provides some commentary about what EAP-TLS
allows, but I find myself unclear both about why this information is being
added and what scenarios are being described.  My current theory is that it's
saying that an EAP peer can use EAP-TLS-based provisioning via captive portal
with only a small amount of pre-provisioned or factory-provisioned information
(the CAs that are locally configured), and we're mentioning this to support
our argument that using EAP-TLS for provisioning (whether with in-EAP
provisioning or captive-portal provisioning) provides advantages and is
generally recommended.  Is that correct?

### reference for EAP-TLS

In §6.2.1 we seem to only list RFC 9190 and this document as references for
EAP-TLS, skipping RFC 5216.  Is that what we want?

### guidance to the experts

Generally we treat "SHOULD NOT" as "MUST NOT, with exceptions".  If NAIs in
the registry SHOULD NOT contain more than one subdomain, what kind of
exceptions might make sense?

Relatedly, I think the guidance should say that NAIs with any "v." subdomain,
leading or otherwise, MUST NOT be retistered, in order to preserve the purpose
of that prefix.

Do we need to specifically include in this section the content from §6.4 that
the Method Type must provide MSK and EMSK?

### expanded method types

The registry guidance (§6.4) says that the "Method Type" column must either be
from the EAP Method types registry or "be an Expanded Type".  How would we
expect an expanded type that is not in the Method Types registry to appear in
our registry?

### specifying designated experts

AFAIK, "For registration requests where a Designated Expert should be
consulted, the responsible IESG area director should appoint the Designated
Expert" is implied by the use of the "Expert Review" registration procedure
and can safely be omitted from this document.

## Nits

### EAP working group

Section 6.5 describes a process including "publish a notice of the decision to
the EAP WG mailing list or its successor" which seems stale even as it is
written (EMU is "EAP Method Update", not "EAP").  Is this really the statement
we want to make?

### pre-defined

Both "pre-defined" and "predefined" (no hyphen) appear, 3 and 6 times,
respectively.  I removed the three hyphens in my PR but both versions appear
in recent-ish RFCs, so the best we can achieve is self-consistency and I don't
care which way we achieve that.

### IAB coordination

In §3.1's "NOTE: the "arpa" domain is controlled by the IAB. Allocation of
"eap.arpa" requires agreement from the IAB." we should probably leave an
RFC-Editor note to change the text to reflect an approval, assuming one is
granted.

### method vs type

In §4.2 we talk about "provisioning done within the EAP type" (as with
EAP-NOOB).  Is it better to talk about it being within the method than the
type, with the method being a better match for the actual operations
performed, as contrasted to the type being an identifier for the method?

### section references for peer unauthenticated access

In §5.1 we reference Section 2.1.1 of RFC 5216 for mention of "peer
unauthenticated access" but give no section reference for RFC 9190, which
seems like a lack of parallelism.

### naming

If we're going to call the registry "EAP Provisioning Identifiers", we might
want to try to bias toward using that term in the running prose; I see a lot
of "predefined NAI"/"predefined identifier" and related phrasing that doesn't
quite line up with the registry's terminology.

### privacy and more privacy

What's the expected difference between §7 and §8.3 (both of which are titled
"Privacy Considerations")?

## Notes

This review is formatted in the "IETF Comments" Markdown format, see
https://github.com/mnot/ietf-comments.