| 开发者 | odysseynewmedia |
|---|---|
| 更新时间 | 2026年6月18日 16:35 |
| PHP版本: | 5.8 及以上 |
| WordPress版本: | 7.0 |
| 版权: | GPLv2 or later |
| 版权网址: | 版权信息 |
llms.txt file is instantly generated. No configuration is needed.
For Power Users: Manage every aspect of your AI strategy. Track bot traffic with built-in analytics, generate JSONL datasets for fine-tuning, and clean up your content with CSS selectors.
Concepts Explained: Why do you need this?
1. What is llms.txt?
Think of llms.txt as a "Sitemap for AI". While humans use HTML pages and Search Engines use XML sitemaps, AI agents look for an llms.txt file in your root directory. This file gives them a clean, prioritised list of links to crawl, ensuring they train on your best content and ignore the junk.
2. What is llms-full.txt (Markdown)?
This is an optional advanced feature (RAG-Ready). Instead of just providing links, llms-full.txt provides your actual website content converted into clean, lightweight Markdown format.
* Why it's useful: It allows AI agents to ingest your website's knowledge immediately without needing to visit and scrape every single HTML page. This reduces server load and ensures the AI gets accurate data for "Retrieval Augmented Generation" (RAG).
* ⚠️ WARNING regarding Virtual Mode Limits: When using Virtual Mode to generate this file, the item limit for the llms-full.txt file is securely capped at 50 by default. Manually increasing this limit beyond 50 in the 'Tools' settings will drastically increase server load and risks causing immediate 500/503 server crashes. Use this feature at your own risk. If you require more than 50 items in your llms-full.txt file, we recommend using Physical Mode instead.
3. What is llms.jsonl (Fine-Tuning)?
This file formats your content into prompt-completion pairs (JSON Lines). This is the standard format used to fine-tune models like GPT-4 or Llama 3 on your specific data.
New Features in 6.0:
.sidebar, .comments) to strip unwanted elements from your Markdown and JSONL files.Allow or Disallow rules for individual AI crawlers (GPTBot, Google-Extended, etc.).robots.txt.odyssey-llms folder to the /wp-content/plugins/ directory.llms.txt file has been generated. A new "Odyssey LLMS" menu item will appear in your admin sidebar.The settings page is located in its own top-level menu in your WordPress admin sidebar, labelled Odyssey LLMS.
In the "Content Intelligence" tab, look for the "Content Cleaning (CSS)" field. Enter CSS selectors for elements you want to remove, separated by commas (e.g., .footer, .nav, #sidebar).
Go to the main settings tab and select "Virtual File" as your serving method. This allows WordPress to intercept bot requests and log them to your dashboard.
No. The Robots.txt editor includes intelligent conflict detection. It will automatically fetch and import any virtual rules created by other SEO plugins so you don't lose them.
⚠️ WARNING: While you can manually increase the post/page limit for the llms-full.txt file in the settings, we strongly warn against setting it too high (especially in Virtual Mode). Doing so will drastically increase server load and risks causing immediate 500/503 server crashes due to the heavy processing required. If you require a large number of items, we recommend using Physical Mode.
The JSONL Prompt Template controls the "question" side of each entry in your llms.jsonl fine-tuning dataset. When you want to train a custom AI model (such as GPT-4, Llama 3, or any fine-tuneable LLM) on your website's content, that process requires structured "Question and Answer" pairs - known as prompt and completion.
How it works:
The plugin generates llms.jsonl where every line is a JSON object in this format:
{"prompt": "What is My Post Title?", "completion": "The full text of that post..."}
The template field lets you define the prompt structure for every post on your site. Use {{title}} as a dynamic placeholder. The plugin will automatically replace it with each post's actual title at generation time.
Examples by use case:
What is {{title}}?Tell me about the {{title}} service provided by our company.What are the features and specifications of {{title}}?How do I {{title}}?Physical File (Recommended for most sites): The plugin writes a real static llms.txt file to your server's root directory. This is served directly by your web server (Apache/Nginx) with maximum speed, and does not require WordPress to load for every bot request. The downside is that bot visits cannot be tracked for Analytics.
Virtual File (Enable Analytics): The file is served dynamically by WordPress via a rewrite rule. This allows the plugin to intercept every request and log it to the Analytics dashboard. It also enables Rate Limiting to throttle abusive bots. The trade-off is a small performance overhead since WordPress must load for each request.
Both modes support llms.txt, llms-full.txt (Markdown), and llms.jsonl generation.
When you select specific taxonomies (e.g. Categories, Tags, WooCommerce Product Categories) in the Content Source tab, the plugin will fetch all non-empty terms for those taxonomies and include their archive page URLs in the generated llms.txt. This is useful for giving AI crawlers a complete picture of your site's topic structure, not just individual posts.
When this option is enabled, the plugin fetches all WordPress users who have at least one published post and appends their author archive URLs (e.g. yoursite.com/author/name/) to the generated llms.txt. This is useful for sites where author credibility and profiles are part of the content strategy.
When WooCommerce Products is enabled in the Content Source settings and WooCommerce is active, the plugin automatically enriches product entries with structured metadata pulled directly from WooCommerce:
llms-full.txt (Markdown) and llms.jsonl, giving AI models accurate, up-to-date product information without needing to scrape individual product pages.
The Content Order setting in the Content Source tab controls the order in which posts appear across all generated files (llms.txt, llms-full.txt, llms.jsonl). Since AI agents typically give more weight to content they encounter earlier in a file, ordering by your most recent or most engaged-with content ensures they train on your best material first.
Available options:
When Virtual File mode is active, the plugin tracks the number of requests made by each IP address within a rolling time window. If a bot or visitor exceeds the configured request limit, they are temporarily blocked and receive a 429 Too Many Requests response with a Retry-After header. The block duration is configurable in the Security tab and enforces a minimum of 1 hour to prevent configuration errors from causing permanent lockouts.
- [Title](URL)) to avoid Agentic Browsing errors.## Optional header.llms-full.txt and llms.jsonl to prevent fatal errors on restricted server environments.robots.txt loopback timeout to prevent settings page hangs on restricted hosting environments.regenerate command when plugin options have not yet been saved.filter_date option to a predefined list of allowed values.WP_CLI::error() in the CLI regenerate command.DOCTYPE, <html>, <body>) into the plain-text output, which caused stray heading markers to appear at the start of generated Markdown files.mb_convert_encoding() HTML-ENTITIES conversion from the Markdown cleaner; UTF-8 is now handled correctly via an explicit charset declaration, eliminating a potential source of character corruption.llms-full.txt and llms.jsonl endpoints not terminating correctly after sending the Content-Type: text/plain header, which could allow WordPress to continue template resolution.esc_html() being incorrectly applied to robots.txt rule output — as a plain-text file, this would corrupt any rule containing &, <, or > characters into HTML entities.weekly schedule, which is always available at activation time.return statement after the permission check in the simulate-hit AJAX handler for consistency and static analysis correctness.Y-d-m (producing 2026-28-04) to the correct ISO 8601 Y-m-d format.libxml_clear_errors() is now called on the successful XML parse path in the Sitemap parser, preventing libxml error objects from accumulating in memory across recursive sitemap index requests.user_agent database write length with the actual varchar(255) column width.llms-full.txt output by post type.weekly schedule at activation time, ensuring consistent background pruning on all environments.Disallow: / directives, closing a bypass where multiline input could slip through the check.llms.txt when taxonomies are selected in the Content Source settings (previously saved but never written to output).llms.txt when the Author Archives option is enabled (previously saved but never written to output).llms-full.txt and llms.jsonl when the WooCommerce integration is enabled (previously saved but never applied to output).llms-full.txt (Markdown) for richer context if available.llms.jsonl).llms-full.txt with full content converted to Markdown.robots.txt. Includes logic to automatically fetch existing virtual rules (from Yoast/RankMath) to prevent data loss.llms.txt for better AI context.llms.txt virtually (required for Analytics) or physically (for performance).llms.txt file is now correctly generated asynchronously immediately upon plugin activation.Disallow rules for each individual AI crawler, offering fine-tuned control.Sitemap: directive to the generated file for better crawler discovery if a sitemap URL is provided.odyssey_llms_default_rawlers to allow developers to extend the default crawler list.llms.txt format with structured sections and added a checklist of common AI crawlers.