Structure documents for AI consumption, not just human reading
Human-formatted documents obstruct AI consumption; plain-text formats such as Markdown let AI work with the underlying knowledge efficiently.
Most institutional documents were written for human readers. They use complex formatting, embedded images, headers and footers, branded covers, proprietary file formats, track-changes markers, comments, and styling that signals status or context at a glance. A person reading the document uses all of that. An AI reading the document has to strip it away before the substance becomes usable, and what it strips is noisy, lossy, and expensive.
The working rule is that documents intended to be AI-accessible should be structured for AI consumption, not for human reading alone. In practice, that means plain-text formats with minimal, explicit structure — Markdown being the current default, because it is human-readable, AI-readable, version-controllable, searchable, and trivially convertible into presentation formats when human reading is needed.
What “structure for AI” actually means
Four characteristics separate AI-friendly documents from human-first ones.
Plain text as the substrate. No proprietary formats, no rich-text encoding, no layout-as-meaning. The semantic content and the formatting are separable, and the formatting does not carry information the AI cannot recover from the text alone.
Explicit structure. Headings that mean what they say. Lists that are lists. Tables that are tables. Metadata at the top of the file, not inferred from position on the page. The goal is that the document is unambiguously parseable without visual interpretation.
Small, focused files rather than monoliths. An AI working with a 200-page master document struggles to locate the relevant passage; the same content split into focused documents with clear titles is far more useful. This is a point about retrieval as much as about format.
Consistent terminology and structure across the corpus. The same concept referred to by different names in different documents creates friction for retrieval and reasoning; so does identical headings meaning different things in different contexts. Consistency is cheap to produce at authoring time and expensive to retrofit, which means the standards for terminology, document structure and metadata pay off disproportionately once AI is operating over the whole body of material.
The outbound case
The same heuristic applies to client deliverables, not only to internal knowledge. A growing share of inbound material at mid-tier firms is first read by a client’s AI before any human on their side engages — see The first reader is an AI — and when that is the consumption pattern, the deliverable has to survive it. A twenty-page memo with the critical caveats buried in footnotes will be summarised badly. A deck that carries its meaning in visual design rather than in explicit text will lose its substance when it is abstracted. A recommendation spread across paragraphs an AI will compress may arrive at the human reader as a different recommendation than the one that was sent.
The practical response is the same one the internal-knowledge case calls for — cleaner document structure, explicit headings, content that does not rely on visual design to carry meaning — applied now to work the firm is sending out rather than only to work it holds internally. For consequential deliverables, a machine-readable version alongside the formatted one is worth the small additional effort, because it puts the firm’s own framing into the client’s AI context rather than leaving the AI to reconstruct it from the formatted version.
How to apply the heuristic
Not every document in an organisation needs to be converted. The high-leverage targets are the documents that will repeatedly sit in AI context: frequently-referenced policies, current client context, methodology and practice standards, worked examples, decision history. Marketing collateral and historical archives can stay as they are.
The broader implication is that format is a knowledge management variable, not a cosmetic one. The same content in two different forms produces different AI utility. Firms investing in knowledge management without thinking about format are leaving most of the gain on the table (see Useful AI is a context problem).