This document defines a mechanism for reporting the status of received mail to the logical originator of that mail. This helps the originator understand how the mail that they ostensibly produce, which might be distributed and delegated, is being received. This is a useful feedback system for mail providers. I want to note that a system like this seems specifically designed to reinforce existing inequities in the mail system. Only large actors will be able to deploy the necessary resources and sophistication to make use of this sort of feedback. That's unavoidable and the system is nonetheless a useful thing to have, but it is worth acknowledging that in building this we contribute more to the centralization of an already-centralized system. I don't know if this rises to the level of notability in the draft, but I wanted to acknowledge that here. # High Level Stuff I found this document pretty hard to process. Part of that is a lack of familiarity with mail and its attendant security apparatus (which seem designed to defy rational analysis, see also earlier points about centralization). However, I found that the document lacked sufficient overview and context-setting to be comprehensible. Take Section 2.1 which launches right into dense paragraphs of "A begat B; B begat C" like the parts of the christian bible that people like to pretend doesn't exist. Buried in that text are incredibly useful pieces of context, like the fact that data is broken down by IP address (presumably as observed by the mail receiver). The list of fields contains wonderful statements like: Mandatory fields are "domain", "p", "sp". I searched and was unable to find what "p" might contain, other than a comment in the XML schema, which appears in an appendix. If the point of this document is to define fields, then it would be best if it contained definitions. Maybe "sp" or "adkim" are obvious to someone versed in the minutiae, but without citations and references, the information in the draft is far less useful than it could be. Having the bulk of the specification in comments in code in an appendix is not ideal. It is not clear to me whether the information included in a report is aggregated (i.e., counts or similar metrics) or simply a collection of per-message details that is gathered into a single report. That a problem. The title says aggregated and there is mention of counts, but the language in a couple of places strongly implies otherwise. I shouldn't have to guess that much. Maybe this could be managed with a section describing the basic layout of the reports. The example at the end helped a bunch, but I'm inferring a lot from that example where it could be spelled out clearly. # Mid-sized stuff S2.5 says "the Mail Receiver MAY send a short report indicating that a report is available but could not be sent" - how? S2.5.1 says "The aggregate data MUST be an XML file that SHOULD be subjected to GZIP [RFC1952] compression." Is there a mechanism by which the Domain Owner can indicate different compression modes? That is, is there agility for this? S2.5.1 says " The aggregate data MUST be present using the media type "application/gzip" if compressed (see [RFC6713]), and "text/xml" otherwise." This has two problems: 1. The text/xml form should be a new media type that describes the format that this document defines. Then, you have a hope of evolving the format in a non-compatible way at some point in the future. 2. The gzip form does not signal the type of the inner document, making any format change impossible when compression is involved. It's not clear to me that the strict rules regarding the construction of filenames and subjects is justified, especially when the report contains the same information. Can you design a single system for carrying the necessary information? (I get that you might want to use something like Subject to ensure routing to the right subsystem, but maybe limit the amount that you need to specify to achieve that purpose only. S2.5 (general) Why is it the responsibility of the transport mechanism to detect duplicates? Can a unique identifier be added to the content of the report? S3 defines a validation process that involves querying DNS at "._report._dmarc.". This will fail when this string is too long, which is pretty easy to manage for an attacker. That's an unrecoverable error, but the procedure says nothing about that error. Does that make certain reporting architectures impossible for some providers? I'm not enthusiastic about the privacy considerations. Whose privacy is affected by leakage (S6.3)? The schema uses xs:string for string types, which means that whitespace is significant. I generally advise people to use xs:token instead so that content can be authored safely, though with the automated nature of this format, this is unlikely to be a significant factor. The schema defines a number of enums, which seem like they might be problematic if you ever need to extend the value space. I'm looking at DKIMResultType and SPFResultType as prime examples of something that might need to be extended. In these case, I generally recommend xs:token as well, pointing at a registry for the valid values. The schema definition for TestingType is a boolean, but it doesn't use xs:bool. Why? Same for DMARCResultType. Why do the dates in the format not use the XML xs:dateTime construction? Why is there a specific container at the top level, but not for each record? I would have thought that extending in the same way for each would be better. # Small stuff S1 mentions terminology that might be better moved to S1.1 S2.2 says "There MAY be optional sections for extensions within the document." <- this is not a "MAY", this is either an "is" or "is not" (I'm guessing "is"). A few lines in the appendix are too long for the RFC format (I see one at 75 characters). In the acknowledgments, this looks like a serious error: "Kvå (U+00E5)l". Are you using ?