LLM Override -- WordPress 插件

开发者	vanguardhive
更新时间	2026年4月13日 18:07
捐献地址:	去捐款
PHP版本:	7.4 及以上
WordPress版本:	6.9
版权:	GPLv2 or later
版权网址:	版权信息

The AI era has a problem: ChatGPT, Claude, and Perplexity are making up facts about your brand. They crawl your raw HTML — a format built for humans — and hallucinate the gaps. Traditional SEO cannot fix this. A static sitemap cannot fix this. LLM Override fixes this. LLM Override is a Machine-to-Machine (M2M) interception engine for WordPress. It speaks the language AI crawlers actually understand: clean, structured Markdown with semantic context — served in real time, directly from your site, without modifying a single page. How AI systems read your content Most AI tools for WordPress generate a list of URLs. That tells an AI crawler where your content is. LLM Override tells AI crawlers what your content means — making your brand accurately represented across AI-powered search engines. When a bot visits your page, LLM Override intercepts the request before WordPress renders any HTML, and responds with a structured Markdown payload containing:

Your content, cleaned of scripts, ads, and UI noise
A YAML frontmatter block with your canonical title, URL, and last-updated timestamp
Your Site Manifest — verifiable organization facts included in your /llms.txt

This is Generative Engine Optimization (GEO): making your content accessible and accurately represented to AI systems. How it works

LLM Override adds a <link rel="alternate" type="text/markdown"> tag into your page <head>.
An AI crawler discovers this link and follows it — that's the standard Content Negotiation protocol.
It appends ?view=raw to your URL and sends the request.
LLM Override intercepts at the WordPress routing layer — no HTML is rendered, no theme loads.
The crawler receives clean, semantic Markdown. Accurate content. No hallucinations.

Your human visitors never see any of this. Their experience is unchanged. Core Features (Free) M2M Interception Engine ✅ Intercepts AI bot requests via ?view=raw — works on any page, any post type ✅ Converts HTML to clean Markdown using league/html-to-markdown ✅ Strips <script>, <style>, <iframe>, and empty elements before conversion ✅ Disables page caching (WP Rocket, LiteSpeed, W3TC, Cloudflare) for M2M requests to guarantee fresh content ✅ Adds X-Robots-Tag: noindex to Markdown responses to prevent duplicate content flags ✅ Adds X-Content-Processing transparency header declaring conversion method and source ✅ Adds YAML frontmatter: title, canonical URL, last modified date, plugin version Content Rules ✅ Site Manifest — provide verifiable organization facts in your /llms.txt site manifest llms.txt Standard Compliance ✅ Dynamic /llms.txt endpoint — always current, zero static files, works on any hosting ✅ Extended /llms-full.txt endpoint — includes content snippets for deeper AI context ✅ Semantic Blockquote — select a global context page via UI to auto-generate the site manifest ✅ Link Grouping — automatically categorizes links by post type (Pages, Optional, etc.) per llmstxt.org specs ✅ Both endpoints automatically respect noindex rules from Yoast SEO, Rank Math, SEOPress, and AIOSEO ✅ Announces /llms.txt in your robots.txt for passive bot discovery ✅ <link rel="alternate" type="text/markdown"> auto-injected into every page <head> Precision Control ✅ Native WordPress metabox on every post/page: exclude from AI manifests or override the M2M payload manually ✅ "View as AI" button in the WordPress Admin Bar: see exactly what any AI bot receives from any page Shadow Analytics Lite ✅ Tracks global M2M interception hits with a simple counter in your dashboard ✅ GDPR-compliant: IP addresses are hashed daily, never stored in plain text ✅ Detects 58 known AI bots across 4 categories (Training, Query, Discovery, Scraping) Enterprise Sanitization ✅ Strips Unicode corruption before delivery: BOM markers, Zero-Width Spaces, Non-Breaking Spaces, Soft Hyphens — the exact characters that cause parser errors in ChatGPT and Claude ✅ Transient-based caching (12-hour TTL) for endpoint performance — with one-click AJAX flush Developer API ✅ 14 documented action/filter hooks for extending behavior without modifying plugin files ✅ Clean OOP architecture with full Composer autoloading Built to WordPress standards LLM Override is developed following strict WordPress coding standards. Every function prefixed, every output escaped, every database query prepared, every nonce verified. No direct filesystem operations. No raw SQL injection. No short PHP tags. The plugin passes the official WordPress Plugin Check tool with zero errors and zero warnings. LLM Override Pro — Industrial-scale GEO The free version covers the complete core M2M engine. Large sites and agencies need scale. Pro unlocks:

🤖 AI Copilot — per-post AI-generated Markdown with custom personas (GPT, Claude, DeepSeek, OpenRouter via BYOK)
⚙️ Batch Accelerator — compile your entire site in the background via Action Scheduler, no timeouts
📊 Full GEO Analytics — granular telemetry: which bots, which pages, which entities were injected
🔬 Autopilot llms.txt — AI-drafted manifesto grounded in your actual content
🏢 Agency MCP Server — expose a full Model Context Protocol endpoint for external agent orchestration Explore Pro features →

1.2.1

Fix: Added Content-Type fallback for Google NotebookLM. NotebookLM is a User-Triggered Fetcher whose ingestion pipeline accepts text/plain but rejects text/markdown for web URLs. When LLM Override detects the Google-NotebookLM User-Agent, it now serves Content-Type: text/plain; charset=utf-8 instead of text/markdown. All other AI crawlers continue to receive the RFC 7763 text/markdown response.
Fix: Bot detection in serve_markdown_response() now runs before header emission, enabling Content-Type selection based on the detected bot.
Enhancement: All five M2M endpoints (per-page, llms.txt cached/generated, llms-full.txt cached/generated) now perform inline bot detection to ensure the NotebookLM fallback applies universally.

1.2.0

Enhancement: Content-Type header changed from text/plain to text/markdown; charset=utf-8 per RFC 7763 — the official MIME type for Markdown documents. This aligns M2M responses with the HTTP specification and improves semantic identification by proxies and CDNs.
Enhancement: Added Vary: Accept header to all Markdown responses. Informs CDNs and reverse proxies (Cloudflare, Varnish, Nginx FastCGI cache) that responses vary by the Accept request header, preventing incorrect cache hits when the same URL serves HTML to browsers and Markdown to bots.
Enhancement: Added X-Markdown-Tokens header to all Markdown responses. Exposes an estimated token count (word_count × 1.33) so AI pipelines can make context-window budget decisions before downloading the full document.
Enhancement: Added ReadAction (Schema.org) to the JSON-LD Semantic Enclosure on all singular pages. AI systems parsing the <head> JSON-LD now discover the ?view=raw M2M endpoint through the standard potentialAction property. The front page additionally advertises /llms.txt and /llms-full.txt discovery endpoints.
Internal: Reordered serve_markdown_response() to assemble the full output before sending headers, enabling accurate token count calculation.

1.1.7

Removed: Terminology Standardization engine. This feature introduced semantic divergence between HTML and M2M payloads, contradicting our core principle of content faithfulness. LLM Override now guarantees strict 1:1 parity between what humans read and what machines receive.

1.1.6

Performance: Resolved critical AJAX timeout when regenerating llms.txt cache on sites with 1,000+ published pages. The admin stats engine now uses lightweight SQL COUNT queries (~2ms) instead of heavy WP_Query meta lookups that generated multiple LEFT JOINs on wp_postmeta.
Performance: Optimized /llms.txt and /llms-full.txt endpoint queries by disabling SQL_CALC_FOUND_ROWS (no_found_rows) and taxonomy cache preloading (update_post_term_cache), reducing memory usage and query time on large sites.
Performance: Fixed Dashboard KPI query that loaded all post IDs into memory unnecessarily. Now fetches only the count.
Fix: Extended the safe shortcode stripping logic (introduced in 1.1.5 for the Content Pipeline) to all four remaining strip_shortcodes() call sites in the llms.txt and llms-full.txt generators. Divi and WPBakery content is now preserved consistently across all endpoints.
Enhancement: llms.txt and llms-full.txt caches now auto-invalidate when any public post is published, updated, or trashed, eliminating up to 12 hours of stale content.

1.1.5

Fix: Prevented destructive Markdown conversion when processing pages built with shortcode-heavy visual builders (Divi, WPBakery). The core pipeline now uses an intelligent recursive loop instead of strip_shortcodes() to safely extract text payloads.

1.1.4

Fix: Migrated all inline <script> tags to use wp_register_script, wp_enqueue_script, and wp_add_inline_script APIs, fully compliant with WordPress enqueue standards.
Fix: Resolved unescaped output variables in metabox template by wrapping all dynamic attributes in esc_attr().
Fix: Corrected JSON-LD script injection by removing unsafe JSON_UNESCAPED_UNICODE and JSON_UNESCAPED_SLASHES flags from wp_json_encode(), preventing potential </script> breakout.
Fix: Restructured printf calls in the llms.txt admin partial to use wp_kses() with explicit allowlists instead of esc_html__() with raw HTML arguments.
Fix: Removed non-permitted binary files from distribution (vendor/bin/html-to-markdown, vendor/league/html-to-markdown/bin/).
Fix: Eliminated duplicate Metabox instantiation that caused hooks to register twice.
Tweak: Extracted Terminology Standardization repeater JavaScript into a dedicated external file (admin/js/llm-override-admin-terminology.js), enqueued conditionally on the Content Rules page only.
Tweak: Yoast SEO dismiss handler now passes nonce via wp_localize_script() instead of inline PHP.
Tweak: Added phpcs:ignore annotations with technical justifications for text/plain API endpoints where esc_html() would corrupt Markdown output.

1.1.3

Fix: Removed residual AJAX callback registrations pointing to non-existent methods, preventing a fatal error in certain configurations.

1.1.2

Fix: Resolved a backend conflict by removing unused AJAX event listeners that could potentially trigger HTTP 500 errors in specific environments.

1.1.1

Fix: Eliminated execution conflicts by removing non-functional code blocks.
Performance: Restored ultra-lightweight architecture by ensuring all processes rely exclusively on the WordPress Options and Transients APIs.

1.1.0

Feature: Terminology Standardization Engine. M2M Engine now globally replaces legacy forbidden terms logic with a structured {from => to} Terminology Dictionary to ensure Content Faithfulness and compliance.
Enhancement: Migrated global term filtering logic to comply with accurate Source Attribution guidelines.
Tweak: Version bump for plugin parity and architectural refactoring to optimize payload delivery.

1.0.5

New: RAG JSON-LD Grounding Engine. Automatically injects semantic TechArticle schema markup into the HTML <head> containing the M2M translated content, allowing search engines and discovery bots to ingest the clean markdown payload directly from the DOM without needing to visit the ?view=raw endpoint.
Enhancement: Complete architectural refactoring of the Content Pipeline. HTML-to-Markdown conversion is now centralized natively inside LLM_Override_Content_Pipeline::convert_to_markdown(), guaranteeing maximum stability and preventing theme builder conflicts or fatal errors during the M2M interception phase.
Fix: Developer Experience (DX) bypass for Stealth Bot Detection. Integrated IDE headless browsers and Localhost environments (127.0.0.1, .local) will no longer trigger false positive M2M interceptions due to stripped Sec-Fetch-* routing headers.

1.0.4

Fix: Added deep exclusions for performance auditing tools (Chrome-Lighthouse, GTmetrix, PingdomPTST) to prevent them from receiving Markdown. Speed tests will now correctly analyze the HTML layout without triggering the Stealth Bot engine.
Fix: Added extended SEO bots exclusions (AhrefsBot, SemrushBot, Applebot, DotBot, MJ12bot) to the whitelist.

1.0.3

Fix: Critical Indexing Hotfix. Excluded honest search engine crawlers (like Googlebot and Bingbot) from being falsely flagged by the Stealth Detection Engine. This ensures good bots receive standard HTML without the noindex header, allowing normal SERP indexing to continue seamlessly.

1.0.2

Fix: Changed the Content-Type header from text/markdown to text/plain to ensure strict AI URL ingesters (like Google NotebookLM) accept the M2M endpoints as valid sources, while maintaining monospace readability in browsers.
Tweak: Restored the X-Robots-Tag: noindex header to prevent search engine SERP pollution, confirming it was not the cause of the NotebookLM blockage.

1.0.1

New: Passive Yoast SEO Compatibility Checker. Intercepts llms.txt overriding rules and Bot Blocker restrictions from Yoast Premium, alerting administrators with actionable fixes inside the WP dashboard.
Fix: Added missing Content-Type: text/markdown header to the M2M payload response. This prevents browsers from incorrectly attempting to parse the payload as HTML and collapsing whitespace/formatting.

1.0.0

Initial release.
New: Semantic llms.txt UI. Select a global context source via dropdown to automatically generate the file's blockquote manifesto.
New: Link grouping in llms.txt. Links are now categorized by post type (## Pages, ## Optional, etc.) following strict llmstxt.org specifications.
Enhancement: M2M payload extraction engine now evaluates up to 400 characters of raw content (vs 120) when native excerpts are missing, providing richer factual context to AI crawlers.
Fix: Addressed WordPress forcing trailing slashes (/) on .txt API endpoints via redirect_canonical. Now serves pristine flat files.
Fix: Replaced generic HTML entity decoding with native WordPress typographic decoders to properly clean apostrophes and quotes injected by Gutenberg.
Tweak: Redesigned the "Regenerate Cache" UI to strictly adhere to B2B aesthetic guidelines.
Active M2M Interceptor engine with structured HTML-to-Markdown conversion.
Global Semantic Injection: Forbidden Terms and Corporate Manifest via YAML frontmatter.
Dynamic /llms.txt and /llms-full.txt endpoint generation.
Algorithmic Discoverability via <link rel="alternate"> tag and robots.txt announcement.
Native SEO integrations with Yoast SEO, Rank Math, SEOPress, and AIOSEO (zero-dependency SQL implementation).
Native per-post exclusion and payload override via WordPress editor metabox.
Admin Dashboard with Shadow Analytics Lite (M2M bot hit counters, GDPR-compliant IP hashing).
"View as AI" Admin Bar button for empirical M2M payload verification.
"Before vs. After" live HTML-to-Markdown simulation in the Dashboard.
Passive bot detection for 58 known AI crawlers across 4 behavioral categories.
HTTP Content Negotiation support (Accept: text/markdown header).
Enterprise Unicode sanitization (BOM, Zero-Width Spaces, Non-Breaking Spaces, Soft Hyphens).
AJAX-driven Transient caching (12-hour TTL) for all M2M endpoints with manual flush.
14 documented action/filter hooks for developer extensibility.
Full compliance with WordPress coding standards: 0 Plugin Check errors, 0 warnings.

LLM Override

标签

下载

详情介绍:

安装:

屏幕截图:

升级注意事项:

常见问题:

更新日志: