I have reviewed this document as part of the security directorate's ongoing effort to review all IETF documents being processed by the IESG.  These comments were written primarily for the benefit of the security area directors.  Document editors and WG chairs should treat these comments just like any other last call comments.   This document concerns international character sets. You might intuitively think that international character sets would have few if any security considerations, but you would be wrong. Many security mechanisms depend on the ability to recognize that two identifiers refer to the same entity and inconsistent handling of international character sets can result in two different pieces of code disagreeing as to whether two identifiers match and this has led to a number of serious security problems.   This document defines 18 categories of characters within the UNICODE character set, with the intention that systems that want to accept subsets of UNICODE characters in their identifiers specify profiles referencing this document, and it defines two initial classes (IdentifierClass and FreeformClass) that could be used directly by lots of protocol specifications.   While I see no problems with this document, it does seem like a missed opportunity to specify some things that are very important in the secure use of international character sets. The most important of these is a rule for determining whether two strings should be considered to be equivalent. It is very common in both IETF protocols and in operating system object naming to adopt a preserve case / ignore case model. That means that if an identifier is entered in mixed case, the mixed case is preserved as the identifier but if someone tries to find an object using an identifier that is identical except for the case of characters, it will find the object. Further, in instances where uniqueness of identifiers is enforced (e.g. user names or file names), a request to create a second identifier that differs only in the case of the characters from an existing one will fail.   These scenarios require that if be well defined whether two characters differ only in case, and while that is an easy check to make in ASCII with 26 letters that have upper and lower case versions, the story is much more complex for some international character sets. Worse, case mapping of even ASCII characters can change based on the “culture”. The most famous example is the Turkish undotted lower case ‘i’ and uppercase dotted ‘I’ which caused security bugs because mapping “FILE” to lowercase in the Turkish Locale did not result in the string “file”. There are also cases where two different lowercase characters are both mapped to the same uppercase character. It is a scary world out there.   To be used safely from a security standpoint, there must be a standardized way to compare two strings for equivalence that all programs will agree on. Programs will still have bugs, but when two programs interpret equivalence differently it is important that it be possible to determine objectively which one is wrong. The ideal way to do this is to have a canonical form of any string such that two strings are equivalent if their canonical forms are identical.   Section “10.4 Local Character Set Issues” acknowledges this problem, but offers no solution.   In section “10.6 Security of Passwords”, this document recommends that password comparisons not ignore case (and I agree). But for passwords in particular, it is vital that they be translated to a canonical form because they are frequently hashed and the hashes must test as identical. One rarely has the luxury of comparing passwords character by character and deciding whether the characters are “close enough”.   Section “10.5 Visually Similar Characters” discusses another hard problem: characters that are entirely distinct but are visually similar enough to mislead users. This problem occurs even without leaving ASCII in the form of the digit ‘0’ vs the uppercase letter ‘O’ and triple of the digit ‘1’, the lowercase letter ‘l’, and the uppercase letter ‘I’. In some fonts, various of these are indistinguishable. International character sets introduce even more such collisions. To the extent that we expect users to look at URLs like https:// www.fideIity.com and recognize that something is out of place, we have a problem. It is probably best addressed by having tables of “looks similar” characters and disallowing the issuance of identifiers that look visually similar to existing ones in places like DNS registries and other places where this problem arises. Having a document that lists the doppelganger character equivalents would be a useful first step towards deploying such restrictions.   I suppose it is too much to expect this document to address either of these issues, but I couldn’t resist suggesting it.                   --Charlie