{"id":6477,"date":"2026-07-02T09:13:37","date_gmt":"2026-07-02T13:13:37","guid":{"rendered":"https:\/\/www.webperformance.com\/load-testing-tools\/blog\/2026\/07\/the-more-you-use-the-ai-the-less-it-runs\/"},"modified":"2026-07-02T10:26:01","modified_gmt":"2026-07-02T14:26:01","slug":"the-more-you-use-the-ai-the-less-it-runs","status":"publish","type":"post","link":"https:\/\/www.webperformance.com\/load-testing-tools\/blog\/2026\/07\/the-more-you-use-the-ai-the-less-it-runs\/","title":{"rendered":"The More You Use the AI, the Less It Runs"},"content":{"rendered":"<p>Every AI product now claims it learns. But look closely and what most of them mean is that somewhere a text file is getting longer. The agent hits a problem, works something out, and appends the lesson to an instructions file (CLAUDE.md, AGENTS.md, a &#8220;memory&#8221; store) that gets stuffed into the prompt on every subsequent call. That is the industry&#8217;s default memory architecture: a diary the model re-reads, at your expense, every single time it does anything.<\/p>\n<p>The problem is that it degrades. In 2025, researchers at Chroma measured what they named <a href=\"https:\/\/research.trychroma.com\/context-rot\">&#8220;context rot&#8221;<\/a> across 18 frontier models, including GPT-4.1 and Claude 4: accuracy falls continuously as prompt length grows, with drops of 20 to 50 percent on long inputs. An earlier paper, <a href=\"https:\/\/arxiv.org\/abs\/2307.03172\">Lost in the Middle<\/a>, showed that models reliably miss information buried in the middle of a long context. Put those together and the standard &#8220;learning&#8221; architecture has a design flaw you can state in one sentence: the more the product knows, the worse it behaves.<\/p>\n<p>When a shiny new AI hammer comes along, every problem looks like a nail. I designed WPLoadTester 7 to use AI for what it does best, leaving everything else to boring, deterministic, testable algorithms. When our AI Assistant solves a problem, it doesn&#8217;t write the lesson into a prompt. It writes the lesson into a rule, hands the rule to a deterministic expert system, and gets out of the way. The expert system applies everything it has accumulated to every future recording before the AI is invoked at all. The result is a product where the relationship runs backward from the rest of the industry: the more you use the AI, the less it runs.<\/p>\n<h2>A rubric for thinking about AI and data<\/h2>\n<p>Here is the mental model behind that decision. When I evaluate any AI-and-data problem, I place it on two axes. One axis: is the data <strong>structured<\/strong> (fields, tables, name-value pairs) or <strong>unstructured<\/strong> (raw text, HTML, a wall of HTTP traffic)? The other: is the processing <strong>deterministic<\/strong> (same input, same output, every time) or <strong>non-deterministic<\/strong> (statistical, sampled, calculated, never quite the same twice)?<\/p>\n<figure style=\"background:#fff;border:1px solid #eee;border-radius:8px;box-shadow:0 1px 3px rgba(0,0,0,0.04);padding:20px 20px 12px;margin:2rem 0;\">\n<div style=\"font-size:0.75rem;text-transform:uppercase;letter-spacing:1px;color:rgba(42,42,42,0.6);font-weight:700;font-family:'Space Grotesk',Inter,-apple-system,sans-serif;\">The AI-and-Data Rubric<\/div>\n<div style=\"font-style:italic;color:#7C8DB0;font-size:0.95rem;margin:4px 0 12px;\">Create knowledge in the expensive quadrant. Store it in the cheap one.<\/div>\n<p>  <svg viewBox=\"0 0 760 560\" role=\"img\" aria-label=\"Two-by-two quadrant chart of data type against processing type. LLMs occupy the unstructured, non-deterministic quadrant, which costs tokens on every use and varies between runs. Rules engines occupy the structured, deterministic quadrant, which costs nothing and behaves identically every run. A coral arrow labeled 'the AI writes the rule' moves solved problems from the LLM quadrant into the rules quadrant, where ASM's detection rules live.\" style=\"width:100%;height:auto;display:block;font-family:Inter,-apple-system,sans-serif;\">\n    <title>The AI-and-data rubric: solve in the expensive quadrant, store in the cheap one<\/title>\n    <!-- quadrant surfaces -->\n    <rect x=\"60\" y=\"16\" width=\"330\" height=\"245\" fill=\"#E9F8FA\"><\/rect>\n    <rect x=\"394\" y=\"16\" width=\"330\" height=\"245\" fill=\"#FAFAFA\"><\/rect>\n    <rect x=\"60\" y=\"265\" width=\"330\" height=\"245\" fill=\"#FAFAFA\"><\/rect>\n    <rect x=\"394\" y=\"265\" width=\"330\" height=\"245\" fill=\"#FFF4E0\"><\/rect>\n    <!-- center dividers -->\n    <line x1=\"392\" y1=\"16\" x2=\"392\" y2=\"510\" stroke=\"#ddd\" stroke-width=\"2\"><\/line>\n    <line x1=\"60\" y1=\"263\" x2=\"724\" y2=\"263\" stroke=\"#ddd\" stroke-width=\"2\"><\/line>\n    <!-- axis labels -->\n    <text x=\"226\" y=\"540\" text-anchor=\"middle\" font-size=\"18\" fill=\"rgba(42,42,42,0.65)\" font-weight=\"600\">Deterministic<\/text>\n    <text x=\"558\" y=\"540\" text-anchor=\"middle\" font-size=\"18\" fill=\"rgba(42,42,42,0.65)\" font-weight=\"600\">Non-deterministic<\/text>\n    <text x=\"30\" y=\"140\" text-anchor=\"middle\" font-size=\"18\" fill=\"rgba(42,42,42,0.65)\" font-weight=\"600\" transform=\"rotate(-90 30 140)\">Structured<\/text>\n    <text x=\"30\" y=\"388\" text-anchor=\"middle\" font-size=\"18\" fill=\"rgba(42,42,42,0.65)\" font-weight=\"600\" transform=\"rotate(-90 30 388)\">Unstructured<\/text>\n    <!-- top-left: rules engines -->\n    <text x=\"84\" y=\"62\" font-size=\"24\" font-weight=\"700\" fill=\"#2A2A2A\" font-family=\"'Space Grotesk',Inter,sans-serif\">Rules engines &#183; databases<\/text>\n    <text x=\"84\" y=\"94\" font-size=\"18\" fill=\"#2A2A2A\">Cheap. Testable. Identical every run.<\/text>\n    <text x=\"84\" y=\"126\" font-size=\"18\" font-weight=\"600\" fill=\"#007B8A\">ASM&#8217;s 300+ detection rules live here.<\/text>\n    <!-- top-right -->\n    <text x=\"418\" y=\"62\" font-size=\"24\" font-weight=\"700\" fill=\"rgba(42,42,42,0.55)\" font-family=\"'Space Grotesk',Inter,sans-serif\">Statistical models<\/text>\n    <text x=\"418\" y=\"94\" font-size=\"18\" fill=\"rgba(42,42,42,0.55)\">Forecasts, anomaly detection.<\/text>\n    <!-- bottom-left -->\n    <text x=\"84\" y=\"472\" font-size=\"24\" font-weight=\"700\" fill=\"rgba(42,42,42,0.55)\" font-family=\"'Space Grotesk',Inter,sans-serif\">Parsers &#183; regex<\/text>\n    <text x=\"84\" y=\"500\" font-size=\"18\" fill=\"rgba(42,42,42,0.55)\">Predictable but brittle.<\/text>\n    <!-- bottom-right: LLMs -->\n    <text x=\"418\" y=\"440\" font-size=\"24\" font-weight=\"700\" fill=\"#2A2A2A\" font-family=\"'Space Grotesk',Inter,sans-serif\">LLMs<\/text>\n    <text x=\"418\" y=\"472\" font-size=\"18\" fill=\"#2A2A2A\">The only tool that works here.<\/text>\n    <text x=\"418\" y=\"500\" font-size=\"18\" fill=\"#2A2A2A\">Costs tokens per use. Answers vary.<\/text>\n    <!-- the learning-loop arrow -->\n    <defs>\n      <marker id=\"arrowhead\" markerWidth=\"9\" markerHeight=\"9\" refX=\"7\" refY=\"4.5\" orient=\"auto\">\n        <path d=\"M0,0 L9,4.5 L0,9 Z\" fill=\"#FF6B5A\"><\/path>\n      <\/marker>\n    <\/defs>\n    <path d=\"M 470 400 C 360 330, 300 260, 220 175\" fill=\"none\" stroke=\"#FF6B5A\" stroke-width=\"4\" marker-end=\"url(#arrowhead)\"><\/path>\n    <rect x=\"240\" y=\"266\" width=\"250\" height=\"38\" rx=\"19\" fill=\"#fff\" stroke=\"#FF6B5A\" stroke-width=\"1.5\"><\/rect>\n    <text x=\"365\" y=\"291\" text-anchor=\"middle\" font-size=\"19\" font-weight=\"600\" fill=\"#FF6B5A\">The AI writes the rule<\/text>\n  <\/svg><figcaption style=\"border-top:1px solid #eee;margin-top:12px;padding-top:8px;font-size:0.8rem;color:#7C8DB0;\"><span style=\"color:#FF6B5A;font-weight:600;letter-spacing:1px;font-variant:small-caps;\">Web Performance<\/span> &nbsp;&#183;&nbsp; Solved problems move down-right to top-left: the AI figures it out once, the expert system applies it forever.<\/figcaption><\/figure>\n<p>The bottom-right quadrant is the interesting one. Before large language models, nothing handled unstructured data non-deterministically in a useful way; now it is the hottest quadrant in software, and every vendor is racing to move their whole product into it. That is the mistake. The bottom-right quadrant is where hard problems get <em>solved<\/em>, because an LLM can read a wall of raw HTTP traffic and reason about what it means. It is a terrible place for solved problems to <em>live<\/em>, because everything in it costs tokens per use and carries a small probability of coming back different. An LLM&#8217;s ability to give different answers to the same question looks like creativity when generating images, but it is a liability for correlations, math, or processing large datasets.<\/p>\n<p>The rubric gives you the rule: <strong>create knowledge in the expensive quadrant, store it in the cheap one.<\/strong> A prompt-file memory violates this rule. It creates knowledge in the expensive quadrant and then leaves it there, paying rent on every call.<\/p>\n<h2>The expert system the AI teaches<\/h2>\n<p>WPLoadTester&#8217;s cheap quadrant has a name: Application State Management, or ASM. It is a rules-based expert system we have been refining since 1999, and it exists because of a problem every load tester knows. You record a user session as HTTP traffic, you replay it with a thousand virtual users, and it breaks immediately, because the recording is full of values that were only valid for the original session: JSESSIONID cookies, CSRF tokens, ASP.NET VIEWSTATE blobs, OAuth Bearer tokens. Finding every dynamic value, tracing where it originates, and wiring the extraction used to take days of an engineer staring at headers.<\/p>\n<p>ASM automates the well-known cases. It ships with over 300 detection rules encoding 27 years of correlation work: framework rules for ASP.NET, Spring, Rails, and PHP; protocol rules for cookies and cache headers; 14 rules for OAuth token flows alone. It runs automatically after every recording and handles roughly 95 percent of the common patterns before a human, or an AI, looks at anything.<\/p>\n<p>The AI Assistant covers the long tail: the proprietary framework, the unusual token scheme, the field that only reveals itself as a runtime failure during replay. This is exactly the bottom-right-quadrant work the rubric prescribes, reasoning over raw, unstructured traffic to figure out what a never-before-seen application is doing. On a recent case, the assistant spent four attempts refining a regex to extract a JWT from an escaped Next.js streaming payload. It ran the same loop a senior engineer would, but finished in seconds.<\/p>\n<p>Here is the part where our AI strategy really pays off. When the assistant cracks a problem like that, it doesn&#8217;t just patch the test case. It writes the solution as a new detection rule: a plain properties file with an extraction pattern and a context scope, saved straight into ASM&#8217;s rule store. The next recording you make gets that rule applied during ASM&#8217;s up-front scan. Zero tokens. Identical, <em>verified<\/em> behavior. No context to rot.<\/p>\n<p>A rule does not get into the store just because the AI wrote it. The assistant tests the extraction in isolation against the recorded traffic, applies it to the test case, then replays the whole scenario end to end to confirm the fix holds in practice. It usually takes a few tries, a small tweak and another replay, before everything passes. Only then does the rule earn its permanent spot.<\/p>\n<h2>What this is not<\/h2>\n<p>To be clear, this is not machine learning. No model weights change. The AI does not get smarter; the <em>system around it<\/em> accumulates knowledge, in a form you can open in a text editor and read.<\/p>\n<p>The research world has been here. <a href=\"https:\/\/arxiv.org\/abs\/2305.16291\">Voyager<\/a> (2023) had an LLM agent build a permanent library of Minecraft skills. <a href=\"https:\/\/arxiv.org\/abs\/2308.10144\">ExpeL<\/a> (2023) extracted reusable insights from an agent&#8217;s successes and failures. <a href=\"https:\/\/arxiv.org\/abs\/2006.08381\">DreamCoder<\/a> (2021) grew libraries of program abstractions before LLMs made it fashionable. We did not invent the idea of an AI that writes down what it learned. The difference is where the learning goes: every one of those systems retrieves its accumulated knowledge back <em>into the prompt<\/em>, where it costs tokens and competes for the model&#8217;s attention. Ours compiles it <em>out of the prompt entirely<\/em>, into a deterministic engine that runs before the model. The research field calls this general direction neuro-symbolic AI. I call it not paying for tokens to solve the same problem over and over again.<\/p>\n<h2>The economics<\/h2>\n<p>An OAuth login flow that takes two hours of manual correlation typically configures in about four minutes with the assistant. That is the first encounter. The second encounter is handled automatically by the expert system&#8217;s up-front scan. Your AI spend concentrates on problems nobody has seen before, which is the only work an LLM should be doing anyway.<\/p>\n<p>Meanwhile the prompt-file architecture pays in the other direction: every lesson learned makes every future call slightly larger, slightly slower, and, per the context-rot data, slightly less reliable.<\/p>\n<p>The AI Assistant is the most impressive thing in WPLoadTester 7, and the entire architecture is designed to need it less every week you use it. Both are useful, but I think the second one is the better feature. One trend I&#8217;m seeing with AI adoption is the more people use it, the more they come to rely on it and use it more. I would rather give you an AI implementation that you&#8217;ll need to use less and less.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Every AI product now claims it learns. But look closely and what most of them mean is that somewhere a text file is getting longer. The agent hits a problem, works something out, and appends the lesson to an instructions file (CLAUDE.md, AGENTS.md, a &#8220;memory&#8221; store) that gets stuffed into the prompt on every subsequent call. That is the industry&#8217;s default memory architecture: a diary the model re-reads, at your expense, every single time it does anything.<br \/>\nThe problem is that it degrades. In 2025, researchers at Chroma measured what they named <a href=\"https:\/\/research.trychroma.com\/context-rot\">&#8220;context rot&#8221;<\/a> across 18 frontier models, including GPT-4.1 &hellip; <a href=\"https:\/\/www.webperformance.com\/load-testing-tools\/blog\/2026\/07\/the-more-you-use-the-ai-the-less-it-runs\/\">Continue reading &raquo;<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[365,8],"tags":[],"class_list":["post-6477","post","type-post","status-publish","format-standard","hentry","category-ai","category-load-testing"],"_links":{"self":[{"href":"https:\/\/www.webperformance.com\/load-testing-tools\/blog\/wp-json\/wp\/v2\/posts\/6477","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.webperformance.com\/load-testing-tools\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.webperformance.com\/load-testing-tools\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.webperformance.com\/load-testing-tools\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.webperformance.com\/load-testing-tools\/blog\/wp-json\/wp\/v2\/comments?post=6477"}],"version-history":[{"count":2,"href":"https:\/\/www.webperformance.com\/load-testing-tools\/blog\/wp-json\/wp\/v2\/posts\/6477\/revisions"}],"predecessor-version":[{"id":6480,"href":"https:\/\/www.webperformance.com\/load-testing-tools\/blog\/wp-json\/wp\/v2\/posts\/6477\/revisions\/6480"}],"wp:attachment":[{"href":"https:\/\/www.webperformance.com\/load-testing-tools\/blog\/wp-json\/wp\/v2\/media?parent=6477"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.webperformance.com\/load-testing-tools\/blog\/wp-json\/wp\/v2\/categories?post=6477"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.webperformance.com\/load-testing-tools\/blog\/wp-json\/wp\/v2\/tags?post=6477"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}