TL;DR for Large Language Models
Table of Contents
Large language models (LLMs) often struggle to extract information from websites. Complex navigation, ads, and dynamic JavaScript content get in the way of the core material. These distractions not only obscure useful information but also waste LLMs’ limited context space. What if there were a way to give LLMs the clear, concise data they need?1
Enter /llms.txt
- a recent (Sept 2024) proposal by Jeremy Howard of Answer.AI.
What he’s proposing is adding an /llms.txt
file readily readable by LLMs to the root of websites. The idea is that the file takes a “Just the facts, ma’am” approach to summarising the site content - more signal, less noise for LLMs to deal with.
What is /llms.txt? #
A plaintext file in Markdown format that contains:
- The website or project name (as an
H1
) - A brief summary in a blockquote containing key information necessary for understanding the rest of the file
- Zero or more markdown sections (e.g. paragraphs, lists, etc) of any type except headings, containing more detailed information about the project and how to interpret the provided files
- Zero or more markdown sections delimited by H2 headers, containing “file lists” of URLs where further detail is available
# My Awesome Site
> A collection of open-source projects and tutorials on observability
Some more details
## Section 1
- [Link 1](https://example.com/link1): Optional details about the link
- [Link 2](https://example.com/link2): Another link with some details
## Key Resources
- [Project Overview](https://example.com/overview): Detailed explanation of our goals
- [API Docs](https://example.com/api-docs): API reference for developers
## Optional
- [Blog](https://example.com/blog): Our latest blog posts
I suppose that in some ways it’s a little like Search Engine Optimisation (SEO) - making website content more discoverable and understandable for search engines, which in turn helps people find the information they’re looking for. Similarly, /llms.txt
aims to make website content more accessible and understandable for LLMs, helping them quickly locate key information.
I can see this being incredibly useful for developer portals that act hubs for various types of resources tailored to support developers in integrating and working with a platform’s APIs or tools.
Seem like a simple, if hacky but practical approach to solving a real problem. It’s gaining some traction, with Anthropic implementing support for it back in November. What will be interesting to watch is if it’s a short-term fix or long-term solution.
Further Reading #
- llms.txt directory - products and companies leading the adoption of the
llms.txt
standard
Humans need that too, but that’s a story for another day 😉 ↩︎