This draft defines a metric for assessing network quality on a simple linear 0-100 scale.  It does this relative to application requirements for latency and loss at a fixed bandwidth.  The resulting limitations are all acknowledged.

Overall, the document is fairly easy to understand and the basic idea seems useful, for all its limitations.  But I don't think it delivers the on the promises it makes.  There are a few ways to address that, but making more realistic promises is the most feasible.

I have a few comments.  I'll let the WG decide whether these rise to the level of issues or not.

I did find the structure of the document frustrating (more below).  It also diverges from a lot of IETF work in that it comes across as seeming to be a bit of a sales pitch, rather than the more customary dispassionate assessment of the advantages, disadvantages, and trade-offs of different choices in the design space.  (To be fair, I read a some of RPM and that is a little bit sensationalist at times.)

Structurally, this draft buries the lede in the most excruciating way imaginable.  By the time I got around to reading the actual metric (in Section 7!) I almost laughed at how simple it was. (The minimum of a set of linear interpolations.)  The lengthy and elaborate build-up made me think it was going to be a lot more sophisticated than it ended up being.  Please consider being a little bit more up front about methodology; I found the current structure patronizing (as in, you can't handle the reveal until you have all the context).

A lot of factors are discarded by the metric.  This is fine, but it does limit applicability.  Probably the most obvious one was the lack of consideration for bandwidth/throughput as a metric that affects quality of experience.  My experience with real-time video suggests that while low bandwidth video can be acceptable as long as latency and loss are controlled, the "optimal" experience is invariably at a high throughput than the minimum tolerable.

There is a lot of hand-waving about measurement methodology.  Relative to RPM, which has concrete values for measurement configurations and (mostly) well-defined methodology, the reproducibility of a metric here depends on recording and replicating measurement methodology across multiple dimensions.  If the goal was to produce a single metric, then I can't see how this achieves that.  Maybe the WG could do the work to pick one.  The 0-100 range is nice, but if you change anything, Tuesday's metric and Thursday's metric become incomparable.

This is compounded by the need to have application-specific profiles to measure against.  Even setting aside measurement methodology, you are talking about presenting people with multiple metrics.  For instance, you might imagine a "gaming" target, a "streaming video" target, a "videoconference" target, a "web browsing" target, and more.  A nice 0-100 number on each seems great, but it's a long way from one number.  And what happens when lived experience for given application drifts away from the benchmark that has been established?  If applications change (video codecs shift to needing more bandwidth, web pages bloat with JS even more, etc...), how do you evolve the metric?  The 0-100 range is nice, but if you change anything, Tuesday's metric and Thursday's metric become incomparable [in a different way to last time].

The discussion of different metrics in Section 5 could be more precise.  Many of these metrics are composable, but the goal of this document is to find *linearly* composable metrics.  After all, as this says, a distribution can be composed through convolution (well, assuming IID, I guess, which is a big assumption in this setting).  Similarly, I think that the list of caveats on QoO (Section 9) makes it a candidate for a "Yes for some applications" along with the other metrics.

Whether linear composition is a worthy goal is probably debatable, though the draft does make something of a reasonable case for addition or splitting the metric across network segments.  If that was what was indeed delivered.  Curiously, whether the metric can be composed doesn't seem to be realized in the draft, which leaves me wondering whether this metric really does compose simply.  Let's say that we have two hops and a QoO metric is 40 and 50.  It's not clear that this naturally translates to an overall metric of 40 (the minimum of those two).  If the number is the result of loss on both hops, then I don't think that you can compose the numbers at all.  If loss happens to be correlated, as it might be, the same packets that would be lost on one hop would also likely be lost on the other and taking the higher loss rate might make sense, allowing you to conclude that the lower metric follows.  However, if the loss is IID on each hop, combining hops gets you a higher overall rate of loss (1-(1-p1)(1-p2)) and a much lower metric, potentially to the point that the metric becomes 0.  Then there is latency, which would need to be divided between hops to get a per-hop metric; no guidance is provided about how to manage that division.