| 开发者 | seanmullins |
|---|---|
| 更新时间 | 2026年4月6日 04:32 |
| PHP版本: | 7.4 及以上 |
| WordPress版本: | 6.9 |
| 版权: | GPLv2 or later |
| 版权网址: | 版权信息 |
llms.txt file for AI assistants and retrieval systems, ensuring only relevant, well-described content is exposed to large language models.
It generates and maintains llms.txt and llms-full.txt — the emerging standard for telling AI systems (ChatGPT, Claude, Perplexity, Gemini, and others) which pages on your site matter most and what they contain.
Unlike auto-generators that dump every URL into a flat file, LLMs.txt Curator takes a curation-first approach. You choose the pages, organise them into sections, fill descriptions, override titles for AI, validate quality, and see exactly which AI bots are reading your file — all from a single interface.
What makes this different
Most llms.txt plugins treat the file as a static output. LLMs.txt Curator treats it as a living asset:
## heading per the specwp llms-txt regenerate, wp llms-txt status, wp llms-txt crawler-logPOST /wp-json/llms-txt/v1/regenerate, GET /wp-json/llms-txt/v1/status_schema_json)_og_description / og_description)llms.txt ends with a coverage report:
Quality Score: 94%
Pages included: 48
Pages with descriptions: 45
Pages missing descriptions: 3
This is visible to the AI systems reading your file, and to you in the Preview tab.
AI Crawler Analytics
Track 12 known bots: GPTBot, ChatGPT-User, ClaudeBot, PerplexityBot, Google-Extended, Applebot-Extended, Meta-ExternalAgent, Bytespider, CCBot, Cohere, DeepSeek, Amazonbot.
The 7-day analytics card shows a visual bar chart of recent activity. All-time totals are kept separately. IP addresses are anonymised before storage — last octet zeroed for IPv4, last 80 bits for IPv6. No data leaves your server.
Safety Mode
Before generation, the validator checks:
/llms.txt and /llms-full.txt.llms.txt descriptions, and full product details in llms-full.txt. Products with "hidden" visibility are excluded, and you can optionally exclude out-of-stock products.
Developer hooks
llmscu_capability filter — override the required capability (default: manage_options)llmscu_post_limit filter — scanner post limit per type (default: 500)llmscu_full_word_limit filter — per-page word cap in llms-full.txt (default: unlimited)llmscu_regenerated action — fires after each successful regeneration with the content stringllms-txt-curator folder to /wp-content/plugins/.llms.txt is a proposed standard (llmstxt.org) that provides AI systems with a curated, Markdown-formatted overview of a website's most important content. A strategic selection of the pages you want AI to know about -- not a sitemap.
The companion file defined in the same spec. While llms.txt contains links and short descriptions, llms-full.txt contains the full Markdown content of each page. Optional -- enable it in Settings when ready.
No. llms.txt alone is sufficient. llms-full.txt is useful if you want AI systems to have immediate access to your full content without additional crawling.
Scans every curated page and fills missing descriptions using a five-step fallback chain: schema markup -> SEO meta -> excerpt -> Open Graph -> page content. Pages with descriptions already set are never touched.
A percentage showing how many listed pages have descriptions. It appears at the bottom of your generated llms.txt, visible to both you and any AI systems reading the file.
Lets you set a different title for a page in your llms.txt output without changing it on your site. Useful when your WordPress title includes your site name but you want AI to see a cleaner, more descriptive title.
Runs validation before every generation. If errors are found, generation is blocked and results shown immediately. Prevents broken or malformed files going live.
Rank Math, Yoast SEO, All in One SEO, SEOPress, and The SEO Framework.
GPTBot (OpenAI), ChatGPT-User, ClaudeBot (Anthropic), PerplexityBot, Google-Extended, Applebot-Extended, Meta-ExternalAgent, Bytespider (ByteDance), CCBot (Common Crawl), Cohere, DeepSeek, and Amazonbot.
The plugin has a rewrite rule fallback that serves both files via WordPress. Choose between "Direct file", "Rewrite rule only", or "Both" (recommended) in Settings.
Yes. Activate network-wide from Network Admin > Plugins, or activate per-site on individual sub-sites.
Each site manages its own independent llms.txt — there is no shared network file. The Network Admin overview page (Network Admin > Settings > LLMs.txt Curator) shows every site's generation status and lets you regenerate any site, or all sites at once, without leaving the network admin.
On subdirectory networks (example.com, example.com/site1) each site writes a physical file at its own path. On subdomain networks (example.com, site1.example.com) sub-sites share a filesystem root, so they serve llms.txt via WordPress rewrite rule from the database instead — this is fully correct and functionally identical.
WP-CLI works per-site using the standard --url= flag: wp llms-txt regenerate --url=https://site1.example.com
No. Everything stays on your server. No telemetry, no external API calls, no cookies. Crawler IP addresses are anonymised before storage.
WordPress.Security.ValidatedSanitizedInput.InputNotSanitized phpcs:ignore annotations to the three JSON blob input lines (ajax_save_settings, ajax_preview, ajax_import). Sanitization occurs correctly after json_decode() via sanitize_all_settings(); the annotations document this intent for static analysis tools.Tested up to updated to 6.9 to match the current WordPress major version.sanitize_all_settings() now validates that sections, post_types, descriptions, and title_overrides are arrays before passing them to helper functions or array_map(). Previously used ?? array() fallbacks which guard against null but not wrong types from malformed JSON, risking a TypeError on PHP 8+.sanitize_sections() now validates pages is an array before iterating it.sanitize_descriptions() and sanitize_title_overrides() now skip non-string values.post_types output wrapped in array_values( array_filter() ) to strip any empty keys produced by sanitize_key().ajax_preview() now passes POST data through sanitize_all_settings() before use, consistent with ajax_save_settings(). Previously used wp_parse_args() without sanitization.GET /wp-json/llms-txt/v1/crawler-stats — returns all-time, 7-day, 30-day, daily, and per-bot first/last seen data.GET /wp-json/llms-txt/v1/pages — returns all curated sections and pages as JSON, useful for external dashboards.llmscu_ / LLMSCU_ prefix throughout, replacing the generic llms_txt_ / LLMS_Txt_ prefix. No data migration required — existing stored options are unaffected.X-Content-Type-Options: nosniff header added when serving llms.txt and llms-full.txt.file_get_contents() — falls through to DB option if the physical file exceeds the cap.no_found_rows, update_post_meta_cache => false, update_post_term_cache => false added to all get_posts() queries.@package docblock corrected from LLMS_Txt_Manager to LLMS_Txt_Curator across all PHP files.load_plugin_textdomain() added — plugin is now translation-ready.Network: true, network activation, per-site isolation, file path safety, network admin overview, and cron verification all done.llms_txt_is_network_activated() helper function.llms_txt_file_write_safe() helper — prevents subdomain sub-sites from overwriting the main site's physical llms.txt. Those sites serve via rewrite rule instead.next_regen key in get_file_status() — formatted next scheduled regeneration time (daily/weekly/instant debounce) for the current site.maybe_schedule_recurring_regen() now returns early on is_network_admin() — prevents the main site's cron schedule being unnecessarily re-synced on every network admin page load.--url= guidance for per-site usage on multisite networks.delete_file() in the generator guarded against subdomain sub-sites deleting a shared file.$network_wide; deactivation clears cron events across all sites.LLMS_Txt_Admin guarded from instantiating in network admin context.get_file_status() returns is_network_activated, is_subdomain_network, file_write_safe, network_overview_url, and next_regen.switch_to_blog() wrapping required in any cron path.wp_die() block that previously blocked activation on non-main sites.status, crawler-log, crawler-clear.