Independent Submission K. Cardillo Internet-Draft Independent Intended status: Informational 12 June 2026 Expires: 14 December 2026 AI.TXT: A Declaration File for AI Usage Preferences, Licensing, and Policy draft-car-ai-txt-wellknown-00 Abstract This document requests registration of two Well-Known URIs under the "/.well-known/" path: "ai.txt" and "ai.json". These URIs define a structured, machine-readable file in which a site operator can declare AI usage preferences (training, scraping, indexing, caching), licensing terms, required attribution, and per-agent rules. "ai.txt" is positioned as a structured attachment surface for AI usage preferences in addition to robots.txt and HTTP-header carriage proposed by the IETF AIPREF working group. As the AIPREF vocabulary stabilizes, "ai.txt" can carry those preferences in a typed, single- file form alongside the broader licensing, attribution, and policy declarations defined in this document. This format is complementary to "robots.txt" [ROBOTS]. Where "robots.txt" can block crawling entirely, "ai.txt" expresses nuanced policies such as "you may crawl but not train on this content" -- a distinction that "robots.txt" alone cannot express. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 14 December 2026. Cardillo Expires 14 December 2026 [Page 1] Internet-Draft ai-txt June 2026 Copyright Notice Copyright (c) 2026 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Relationship to Existing Standards . . . . . . . . . . . 3 1.2. Relationship to AIPREF . . . . . . . . . . . . . . . . . 4 1.3. Related Work . . . . . . . . . . . . . . . . . . . . . . 4 1.4. Requirements Language . . . . . . . . . . . . . . . . . . 5 2. The "ai.txt" Well-Known URI . . . . . . . . . . . . . . . . . 5 2.1. Location . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2. Format . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3. Site Fields . . . . . . . . . . . . . . . . . . . . . . . 6 2.4. Content Policy Fields . . . . . . . . . . . . . . . . . . 6 2.5. Training Path Fields . . . . . . . . . . . . . . . . . . 7 2.6. Licensing Fields . . . . . . . . . . . . . . . . . . . . 7 2.7. Agent Blocks . . . . . . . . . . . . . . . . . . . . . . 7 2.8. Content Requirement Fields . . . . . . . . . . . . . . . 8 2.9. Compliance Fields . . . . . . . . . . . . . . . . . . . . 8 3. The "ai.json" Well-Known URI . . . . . . . . . . . . . . . . 8 3.1. Location . . . . . . . . . . . . . . . . . . . . . . . . 8 3.2. Format . . . . . . . . . . . . . . . . . . . . . . . . . 8 4. Agent Behavior . . . . . . . . . . . . . . . . . . . . . . . 9 4.1. Discovery . . . . . . . . . . . . . . . . . . . . . . . . 9 4.2. Compliance . . . . . . . . . . . . . . . . . . . . . . . 9 5. Security Considerations . . . . . . . . . . . . . . . . . . . 10 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 6.1. Well-Known URI Registration: "ai.txt" . . . . . . . . . . 10 6.2. Well-Known URI Registration: "ai.json" . . . . . . . . . 10 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 7.1. Normative References . . . . . . . . . . . . . . . . . . 11 7.2. Informative References . . . . . . . . . . . . . . . . . 11 Appendix A. Example: News Site . . . . . . . . . . . . . . . . . 12 Appendix B. Acknowledgments . . . . . . . . . . . . . . . . . . 12 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 13 Cardillo Expires 14 December 2026 [Page 2] Internet-Draft ai-txt June 2026 1. Introduction AI systems increasingly interact with website content in ways that go beyond traditional crawling: training language models on web content, indexing content for retrieval-augmented generation, caching content for future reference, and scraping data for analysis. Website operators currently have no standard, machine-readable mechanism to communicate their policies regarding these AI-specific uses. "robots.txt" [ROBOTS] can block crawling entirely, but it cannot express nuanced policies. A newspaper may wish to allow crawling (for search indexing) while prohibiting training (for model development). A blog may wish to allow training under a specific license. A corporation may wish to allow some AI agents while blocking others. "ai.txt" addresses this gap. It is a policy declaration file, served at a well-known location, that communicates to AI systems: * Whether content may be used for AI model training * Whether content may be scraped, indexed, or cached * Under what license terms AI training is permitted * Which AI agents are permitted and under what conditions * What attribution and disclosure requirements apply * What compliance and audit expectations exist 1.1. Relationship to Existing Standards "ai.txt" is complementary to, and does not replace, existing standards: robots.txt [ROBOTS]: Declares crawling restrictions. "ai.txt" adds training, licensing, and per-agent policy declarations that "robots.txt" cannot express. Both files may coexist. agents.txt: Declares AI agent capabilities (endpoints, protocols, auth). "ai.txt" declares policy. A site may use both: "agents.txt" to declare what agents can DO, and "ai.txt" to declare what is ALLOWED. security.txt [RFC9116]: Declares security vulnerability disclosure contacts. Similar well-known file pattern; different domain. Cardillo Expires 14 December 2026 [Page 3] Internet-Draft ai-txt June 2026 1.2. Relationship to AIPREF The IETF AIPREF working group is developing a vocabulary [AIPREF-VOCAB] for expressing AI usage preferences and an attachment specification [AIPREF-ATTACH] for carrying those preferences via robots.txt directives and HTTP response headers. "ai.txt" complements that work; it does not replace it. AIPREF defines the vocabulary (the set of preference terms and their semantics) and two carriage mechanisms (robots.txt and HTTP headers). "ai.txt" is a third carriage mechanism -- a single, structured, typed file -- that provides three properties not addressed by robots.txt attachment or per-response headers: * Carriage of preferences for an entire site, independent of any individual response or robots.txt path block. * A single audit surface -- one file at one URL -- that can be fetched once and cached for site-wide preference resolution. * A place to declare preferences alongside related declarations (licensing, attribution, per-agent rate limits) that fall outside AIPREF's scope. When the AIPREF vocabulary stabilizes, "ai.txt" implementations SHOULD use AIPREF preference names where they apply. Implementations SHOULD treat the preferences carried in "ai.txt" as equivalent in authority to the same preferences carried via the AIPREF robots.txt or HTTP-header mechanisms. Where multiple carriers disagree for the same site and resource, conflict resolution is out of scope for this document and may be addressed by future AIPREF output. 1.3. Related Work The following efforts overlap with or are adjacent to this document. Spawning ai.txt (2023) [SPAWNING-AITXT]: An earlier file at "/ai.txt" published by Spawning Inc. for text-and-data-mining opt- out, scoped narrowly to TDM permission per file pattern. The format defined in this document is a strict superset, covering training, scraping, indexing, caching, per-agent rules, licensing, and attribution. The present document acknowledges Spawning's prior use of the name and positions itself as a successor declaration surface rather than a competing one. W3C TDM Reservation Protocol [TDMREP]: Defines a "/.well-known/ Cardillo Expires 14 December 2026 [Page 4] Internet-Draft ai-txt June 2026 tdmrep.json" file for declaring text and data mining reservations under EU Directive 2019/790. Adjacent in domain (machine-readable opt-outs) but narrower in scope (TDM reservation only). "ai.txt" can reference or coexist with "tdmrep.json"; sites with TDM-only requirements MAY use "tdmrep.json" alone. Cloudflare Content Signals Policy [CF-CONTENT-SIGNALS]: A robots.txt extension deployed at scale (millions of domains) that adds AI- specific signals (search, ai-input, ai-train) to robots.txt User- agent / Allow / Disallow records. Like AIPREF attach, it carries preferences inside robots.txt. "ai.txt" carries the same class of preferences -- plus licensing, attribution, and per-agent metadata -- in a separate file. Sites MAY publish both; their semantics SHOULD agree. agents.txt: A companion well-known file that declares what AI agents CAN do on a site (sanctioned endpoints, protocols, authentication). Where "ai.txt" expresses usage preferences and policy, "agents.txt" expresses positive capability. They are designed to coexist. 1.4. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. 2. The "ai.txt" Well-Known URI 2.1. Location The "ai.txt" file MUST be served at: https://example.com/.well-known/ai.txt The file MUST be served over HTTPS in production deployments. HTTP is permitted only in development or testing environments. The file MUST be served with Content-Type "text/plain; charset=utf- 8". Cardillo Expires 14 December 2026 [Page 5] Internet-Draft ai-txt June 2026 2.2. Format The "ai.txt" file uses a block-based key-value format inspired by "robots.txt". Each line contains a key, a colon, and a value. Lines beginning with "#" are comments. Indented lines (two or more spaces, or one or more tabs) belong to the preceding block. A minimal "ai.txt" file: # ai.txt Spec-Version: 1.0 Site-Name: My Blog Site-URL: https://myblog.com Training: deny 2.3. Site Fields Site-Name (REQUIRED): Human-readable name of the site or service. Site-URL (REQUIRED): Canonical HTTPS URL of the site. Spec-Version (OPTIONAL): Version of the "ai.txt" specification the file conforms to (e.g., "1.0"). This is a regular field, not a comment. Generated-At (OPTIONAL): ISO 8601 timestamp of when the file was generated. This is a regular field, not a comment. Description (OPTIONAL): Brief description of the site. Contact (OPTIONAL): Contact email for AI policy inquiries. Policy-URL (OPTIONAL): URL to a human-readable AI policy page. 2.4. Content Policy Fields These fields declare site-wide defaults. Each accepts "allow" or "deny". The value "conditional" is valid only for the Training field, where it activates the per-path rules defined in the Training Path Fields section; implementations encountering "conditional" on any other field SHOULD treat it as "deny". Training (OPTIONAL, default "deny"): Whether AI systems may use content for model training. Scraping (OPTIONAL, default "allow"): Whether AI agents may scrape or read content. Cardillo Expires 14 December 2026 [Page 6] Internet-Draft ai-txt June 2026 Indexing (OPTIONAL, default "allow"): Whether AI systems may index content for retrieval. Caching (OPTIONAL, default "allow"): Whether AI systems may cache content. 2.5. Training Path Fields When Training is "conditional", these fields specify per-path rules: Training-Allow (OPTIONAL): Glob pattern for paths where training is permitted. Training-Deny (OPTIONAL): Glob pattern for paths where training is denied. Multiple Training-Allow and Training-Deny lines MAY appear. More specific patterns take precedence. 2.6. Licensing Fields Training-License (OPTIONAL): SPDX license identifier [SPDX] for AI training use (e.g., "CC-BY-4.0"). Training-Fee (OPTIONAL): URL to commercial licensing or pricing page. 2.7. Agent Blocks Agent blocks declare per-agent policy overrides. The wildcard "*" sets the default for all agents. Agent: * Rate-Limit: 60/minute Agent: ClaudeBot Training: allow Rate-Limit: 200/minute Agent: GPTBot Training: deny Scraping: deny Agent identifiers SHOULD match the first token of the agent's User- Agent header (case-insensitive). Fields within an Agent block: Cardillo Expires 14 December 2026 [Page 7] Internet-Draft ai-txt June 2026 * Training, Scraping, Indexing, Caching: Override site-wide policy * Rate-Limit: Advisory rate limit in "N/window" format (second, minute, hour, day) 2.8. Content Requirement Fields Attribution (OPTIONAL): Whether AI outputs must attribute the source. One of: "required", "recommended", "none". AI-Disclosure (OPTIONAL): Whether AI-generated content derived from this site must be disclosed as AI-generated. One of: "required", "recommended", "none". 2.9. Compliance Fields Audit (OPTIONAL): Whether AI agents must provide audit receipts. One of: "required", "optional", "none". Audit-Format (OPTIONAL): Expected audit format identifier (e.g., "rer-artifact/0.1"). 3. The "ai.json" Well-Known URI 3.1. Location The JSON companion file MUST be served at: https://example.com/.well-known/ai.json The file MUST be served with Content-Type "application/json; charset=utf-8". 3.2. Format The JSON format contains equivalent information to "ai.txt" in a typed JSON structure suitable for direct consumption by programmatic clients. The "ai.txt" file MAY reference the JSON file via: AI-JSON: https://example.com/.well-known/ai.json A minimal "ai.json" document: Cardillo Expires 14 December 2026 [Page 8] Internet-Draft ai-txt June 2026 { "specVersion": "1.0", "site": { "name": "My Blog", "url": "https://myblog.com" }, "policies": { "training": "deny", "scraping": "allow", "indexing": "allow", "caching": "allow" }, "agents": { "*": {} } } Field semantics are identical to those defined in Section 2 for the text format, with one structural difference: in the JSON form, the "specVersion" member, the "policies" member (with all four of its "training", "scraping", "indexing", and "caching" members), and the "agents" member are REQUIRED. Defaults that the text format applies implicitly MUST be stated explicitly in JSON documents. 4. Agent Behavior 4.1. Discovery AI agents and crawlers SHOULD fetch "/.well-known/ai.txt" and/or "/.well-known/ai.json" before interacting with an unfamiliar site. Agents SHOULD prefer the JSON format when both are available. Agents SHOULD cache the policy for the duration declared by the HTTP Cache-Control header, with a minimum TTL of 60 seconds. 4.2. Compliance "ai.txt" is advisory. It declares the site owner's policy. Compliance is expected in good faith but is not enforced by the file itself. Agents SHOULD respect Training declarations by not using content for model training when Training is "deny". Agents SHOULD respect rate limit declarations. Cardillo Expires 14 December 2026 [Page 9] Internet-Draft ai-txt June 2026 Servers MUST enforce rate limits and access control independently of the declarations in "ai.txt". 5. Security Considerations Policy declarations MUST NOT include actual credentials, tokens, or secrets of any kind. "ai.txt" is advisory; servers MUST enforce policies independently. Agents MUST validate that referenced URLs use HTTPS before following them. Site owners SHOULD review their "ai.txt" periodically to ensure it accurately reflects current policy. 6. IANA Considerations 6.1. Well-Known URI Registration: "ai.txt" This document requests registration of the following Well-Known URI in the "Well-Known URIs" registry established by [RFC8615]: URI suffix: ai.txt Change controller: Kayla Cardillo Specification document(s): This document. Related information: Text-format AI policy declaration file. Allows website operators to declare their AI content policy - training permissions, licensing terms, per-agent rules, and compliance requirements. 6.2. Well-Known URI Registration: "ai.json" This document requests registration of the following Well-Known URI in the "Well-Known URIs" registry established by [RFC8615]: URI suffix: ai.json Change controller: Kayla Cardillo Specification document(s): This document. Related information: JSON-format AI policy declaration file. Companion format to ai.txt. Cardillo Expires 14 December 2026 [Page 10] Internet-Draft ai-txt June 2026 7. References 7.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, . [RFC8615] Nottingham, M., "Well-Known Uniform Resource Identifiers (URIs)", RFC 8615, DOI 10.17487/RFC8615, May 2019, . [RFC9110] Fielding, R., Ed., Nottingham, M., Ed., and J. Reschke, Ed., "HTTP Semantics", STD 97, RFC 9110, DOI 10.17487/RFC9110, June 2022, . 7.2. Informative References [AIPREF-ATTACH] "Attaching AI Usage Preferences to Content", Work in Progress, Internet-Draft, draft-ietf-aipref-attach, 2026, . [AIPREF-VOCAB] "A Vocabulary for Expressing AI Usage Preferences", Work in Progress, Internet-Draft, draft-ietf-aipref-vocab, 2026, . [CF-CONTENT-SIGNALS] "Cloudflare Content Signals Policy", 2025, . [RFC9116] Foudil, E. and Y. Shafranovich, "A File Format to Aid in Security Vulnerability Disclosure", April 2022. [ROBOTS] "Robots Exclusion Protocol", September 2022, . Cardillo Expires 14 December 2026 [Page 11] Internet-Draft ai-txt June 2026 [SPAWNING-AITXT] "ai.txt -- Generate ai.txt files for your website", 2023, . [SPDX] "SPDX License List", 2024, . [TDMREP] "TDM Reservation Protocol", 2022, . Appendix A. Example: News Site # ai.txt - AI Policy Declaration Spec-Version: 1.0 Site-Name: News Daily Site-URL: https://newsdaily.com Contact: ai@newsdaily.com Policy-URL: https://newsdaily.com/ai-policy Training: conditional Scraping: allow Indexing: allow Caching: allow Training-Allow: /articles/free/* Training-Deny: /articles/premium/* Training-License: CC-BY-4.0 Training-Fee: https://newsdaily.com/ai-licensing Agent: * Rate-Limit: 30/minute Agent: ClaudeBot Training: allow Rate-Limit: 120/minute Agent: GPTBot Training: deny Attribution: required AI-Disclosure: required Appendix B. Acknowledgments The "ai.txt" format draws on the design of "robots.txt" [ROBOTS] and "security.txt" [RFC9116] for structural inspiration. The SPDX license identifiers referenced in Training-License are maintained by the Linux Foundation [SPDX]. Cardillo Expires 14 December 2026 [Page 12] Internet-Draft ai-txt June 2026 Author's Address Kayla Cardillo Independent Email: contactkaylacard@gmail.com Cardillo Expires 14 December 2026 [Page 13]