Hi,

I have been selected as the Operational Directorate (opsdir) reviewer for this Internet-Draft. 

The Operational Directorate reviews all operational and management-related Internet-Drafts to ensure alignment with operational best practices and that adequate operational considerations are covered.

A complete set of _"Guidelines for Considering Operations and Management in IETF Specifications"_ can be found at https://datatracker.ietf.org/doc/draft-ietf-opsawg-rfc5706bis/.

While these comments are primarily for the Operations and Management Area Directors (Ops ADs), the authors should consider them alongside other feedback received.


- Document: draft-elkhatabi-verifiable-telemetry-ledgers-05

- Reviewer: Joe Clarke

- Review Date: May 27, 2026

- Intended Status: Informational 

 
---

## Summary 

Choose one: **Has Issues**

## General Operational Comments Alignment with RFC 5706bis 

I recommend the authors consider adding a short, consolidated "Operational Considerations" section that gathers and cross-references the operationally relevant text already scattered through the draft.

Per RFC 5706bis, an informational document with this much deployment surface would benefit from an explicit Operational Considerations section. I'd suggest a short section (e.g., a new Section 11) that covers:

* Health/fault management: what an operator should monitor (replay-state persistence, AEAD-failure rate, anchoring backlog, calendar reachability, day-artifact write success, OTS upgrade lag).
* Configuration management: enumeration of deployment-tunable parameters (acceptance window, connectivity window, retention period, calendars used, anchoring mode warn/strict, peer-quorum threshold) with recommended defaults where applicable.
* Performance/scaling: expected artifact size relative to record count, OTS submission rate, and effect of multi-calendar submission.
* Verifying correct operation: how an operator (not auditor) sanity checks that the pipeline is healthy.

Additionally, I did find a few "minor" issues that should be shored up to make this ledger more consumable and deployable.

First, in your introduction, you state: "The gateway is expected to maintain UTC time for ingest_time assignment and day-artifact rollover. The device is not required to keep UTC wall-clock time for this profile."  But no where in here do I see guidance that the gateway should be using an authoritative time source.  Maybe shore this up with something like this text (maybe in a new Operational Considerations section):

NEW:

The gateway is expected to maintain UTC time for ingest_time assignment and day-artifact rollover. Deployments SHOULD use an authenticated time-synchronization mechanism (for example, authenticated NTP or equivalent) and SHOULD monitor clock health, since gateway clock errors propagate directly into committed ingest_time values and UTC day-boundary assignment. Behavior on detected clock step, regression, or loss of synchronization is a deployment policy matter and SHOULD be documented by the operator. The device is not required to keep UTC wall-clock time for this profile.

Related to this, I see "naked" references to OTS and RFC3161 in this doc.  Outside of the abstract, those should be linked xrefs.

Section 4.2 states "Gateways SHOULD persist replay state across restart." Given that loss of replay state is described in Section 10 as enabling key-reuse and unsafe acceptance scenarios, I'd argue this is closer to a MUST in practice, or at minimum the SHOULD should be paired with the explicit consequence. What about:

OLD:

Gateways SHOULD persist replay state across restart. If replay state is lost, gateways SHOULD record a continuity break event and SHOULD NOT silently re-accept counters that could already have been committed.

NEW:

Gateways SHOULD persist replay state across restart. Loss of replay state can lead to nonce reuse and unsafe acceptance under the same AEAD key (see Section 10). If replay state is lost, gateways MUST record a continuity-break event, MUST NOT silently re-accept counters that could already have been committed, and MUST require explicit resynchronization or AEAD-key rotation before further frames from affected devices are accepted.

In Section 7, the Class A description contains:

A Class A verifier can recompute canonical-record digests from disclosed record artifacts, the **ADR-003** Merkle root, ...

"ADR-003" doesn't show up anywhere else in this draft. Maybe this is some hold-over from something deployment or project-specific?

OLD:

A Class A verifier can recompute canonical-record digests from disclosed record artifacts, the ADR-003 Merkle root, block/day consistency, the day artifact digest, manifest artifact digests, and enabled anchoring or publication proofs when those artifacts are present.

NEW:

A Class A verifier can recompute canonical-record digests from disclosed record artifacts, the batch Merkle root as defined in Section 4.5, block/day consistency, the day artifact digest, manifest artifact digests, and enabled anchoring or publication proofs when those artifacts are present.

Likewise, Appendix E enumerates reasons including **postcard_pod_id_mismatch** and **postcard_fc_mismatch**. The term "postcard" is not introduced anywhere in the document body. Either define it (or reference it) or rename these reasons to something more general.

Throughout this draft, "trackone" is used for media types. Why is "trackone" in the name?  This appears to be a project name/codename.  I feel this should be more generic if this is being proposed as an industry document.

---