Skip to main content

TL;DR for Large Language Models

·3 mins

Large language models (LLMs) often struggle to extract information from websites. Complex navigation, ads, and dynamic JavaScript content get in the way of the core material. These distractions not only obscure useful information but also waste LLMs’ limited context space. What if there were a way to give LLMs the clear, concise data they need?1

Enter /llms.txt- a recent (Sept 2024) proposal by Jeremy Howard of Answer.AI.

What he’s proposing is adding an /llms.txt file readily readable by LLMs to the root of websites. The idea is that the file takes a “Just the facts, ma’am” approach to summarising the site content - more signal, less noise for LLMs to deal with.

What is /llms.txt? #

A plaintext file in Markdown format that contains:

  • The website or project name (as an H1)
  • A brief summary in a blockquote containing key information necessary for understanding the rest of the file
  • Zero or more markdown sections (e.g. paragraphs, lists, etc) of any type except headings, containing more detailed information about the project and how to interpret the provided files
  • Zero or more markdown sections delimited by H2 headers, containing “file lists” of URLs where further detail is available
# My Awesome Site  
> A collection of open-source projects and tutorials on observability

Some more details

## Section 1
- [Link 1](https://example.com/link1): Optional details about the link
- [Link 2](https://example.com/link2): Another link with some details

## Key Resources  
- [Project Overview](https://example.com/overview): Detailed explanation of our goals  
- [API Docs](https://example.com/api-docs): API reference for developers  

## Optional
- [Blog](https://example.com/blog): Our latest blog posts
“Optional” indicates that the URLs provided can be skipped if the LLM doesn’t need to reduce context usage.

I suppose that in some ways it’s a little like Search Engine Optimisation (SEO) - making website content more discoverable and understandable for search engines, which in turn helps people find the information they’re looking for. Similarly, /llms.txt aims to make website content more accessible and understandable for LLMs, helping them quickly locate key information.

I can see this being incredibly useful for developer portals that act hubs for various types of resources tailored to support developers in integrating and working with a platform’s APIs or tools.

Seem like a simple, if hacky but practical approach to solving a real problem. It’s gaining some traction, with Anthropic implementing support for it back in November. What will be interesting to watch is if it’s a short-term fix or long-term solution.

Further Reading #


  1. Humans need that too, but that’s a story for another day 😉 ↩︎