I’ve spent the last few weeks wrestling with Claude’s recent performance dips in real repos, trying to figure out if my code was bad or the model was broken. It turns out, it wasn’t just me; Anthropic finally admitted that three specific bugs caused a significant intelligence drop across their models. This isn’t just a minor glitch—it’s a confirmed degradation of reasoning depth and output quality that has been affecting developers for months.

After denying these issues for over a month, the “intelligence drop” bug was finally spilled out by Anthropic itself. The admission reveals three distinct failures: reasoning levels were secretly downgraded from “High” to “Medium,” a caching bug caused thinking records to be cleared every turn, and a 25-word limit in system prompts degraded output quality. With these three bugs stacking up, the user experience for Claude plummeted.
I think secretly downgrading reasoning modes without telling us is unacceptable for production tools. As a builder, a caching bug that clears memory every turn breaks complex multi-step coding tasks. Personally, system prompt limits should never silently degrade output quality in enterprise settings.
Fortunately, competitors applied timely pressure today; educating users is ultimately not a viable strategy. The timing feels too coincidental: GPT-5.5 just launched, and Claude immediately started “admitting defeat.” One has to wonder if 5.5 helped debug the situation, or if Dario Amodei intentionally made Claude dumber to look good when GPT-5.5 was released.

That said, while updates with bugs are good news in principle, the timing is too coincidental: GPT-5.5 just launched, and Claude immediately started “admitting defeat.” Could it be that 5.5 helped debug? Dario, you didn’t intentionally make Claude dumber to look good when GPT-5.5 was released, did you?

The bugs are confirmed: there were three. First off, this isn’t the first time. Last August, Anthropic released a similar postmortem on performance degradation affecting Opus 4.0 and 4.1, with official statements claiming, “We never intentionally lowered model quality.” The title of this new postmortem is “A postmortem of three recent issues.” The word “recent” makes the meaning clear. Not just now, but recently.

The reason it’s “recent” is that the community has been complaining about Claude’s performance drop for a long time. More than ten days ago, Stella Laurenzo, Senior Director of AMD AI, published a rigorous audit report on GitHub covering 6,852 session files, 17,871 thinking blocks, and over 230,000 tool calls.

The analysis showed that starting in February, the model’s reasoning depth experienced a cliff-like decline. More detailed findings revealed that Claude began falling into reasoning loops and showed a clear tendency to choose “the simplest fix” rather than the correct one. BridgeMind’s BridgeBench tests also exploded during this period; Opus 4.6’s accuracy dropped from 83.3% to 68.3%, causing its ranking to fall from #2 to #10.
I think accuracy dropping by 15 points makes the model unreliable for critical code generation. As a builder, falling into reasoning loops wastes tokens and increases latency significantly.

Although researchers later criticized the methodology, noting that the number of tasks in pre- and post-tests was completely different, the narrative that “Claude has become stupid” had already spread. Netizens even coined a new term: AI shrinkflation. For the same price, you receive a diluted product. In other words, everyone bought Anthropic’s “children’s meal.” But there was no way around it; Claude was indeed unbeatable at the time. Many users complained while continuing to use it, with little recourse.
It wasn’t until GPT-5.5 launched that Anthropic could no longer sit still. They published a postmortem on their official blog, breaking down the “intelligence drop” of the past two months into three independent issues:
Secretly downgrading reasoning levels.
On March 4, Claude Code’s default reasoning was changed from High to Medium due to high latency in High mode. However, the interface still displayed “High.” Users thought they were using the full-power version but actually received a downgraded product. It wasn’t rolled back for over a month.
Getting dumber with every chat.
On March 26, a caching optimization was deployed, intended to clear old thinking records after sessions had been idle for more than an hour. However, due to a code bug, the clearing action executed on every turn. Claude continued working but gradually forgot why it was doing so. Symptoms of this bug included forgetfulness, repetition, and erratic tool calls. Furthermore, because thi
The Cost of Silence: Why Claude’s Postmortem Didn’t Land
I’ve been watching the fallout from Anthropic’s recent admission that performance declined, and frankly, it feels less like a technical hiccup and more like a crisis of trust. The postmortem confirmed three specific bugs—cache misses, token bloat, and a prompt constraint error—but the real story is how these issues compounded over two months while official channels stayed silent.
The Three Bugs That Broke the Flow
The technical details are stark. Context linking records were repeatedly cleared, causing cache misses with every request and sending token consumption skyrocketing. It took 15 days to fix this specific issue alone.
One prompt line ruined output quality.
On April 16, a system prompt was added: “Text between tool calls must not exceed 25 words; final responses must not exceed 100 words.” Both Opus 4.6 and 4.6 saw a 3% performance drop, leading to a rollback four days later.
These three issues affected different user groups at different times. The cumulative effect was that Claude Code continuously deteriorated unevenly, but no one could pinpoint exactly what went wrong.
On Twitter, the official account @ClaudeDevs posted a summary, and Boris Cherny, known as the “father of Claude,” personally replied in the comments, previewing that bugs in Opus 4.7 are also being fixed.

But the problem is that bug fixes alone don’t explain everything that happened over these two months.
April: Anthropic’s Series of Self-Inflicted Blows
Looking at the timeline, April was essentially three consecutive punches to Anthropic’s own face.
On April 4, Anthropic banned third-party agentic tools like OpenClaw from running via Pro/Max subscriptions. Want to keep using them? Switch to API pay-per-token pricing.
On April 21, the official pricing page quietly removed Claude Code from the Pro plan, and support documentation changed from “Pro or Max plan” to “Only Max plan.”
After being caught by netizens, Head of Growth Amol Avasare claimed this was merely an A/B test for 2% of new users. However, the public-facing pages were updated site-wide, creating a complete contradiction in messaging. They rolled back hours later, embarrassed.
Doing the math: Pro is $20/month ($240/year). To continue using Claude Code, you must upgrade to Max starting at $5x, or $100/month ($1,200/year)—five times the cost. Max 20x is $2,400—ten times the cost. There are no intermediate tiers.
Note that these figures are in USD.
On April 23, today, the postmortem went live, with compensation being a reset of usage quotas.
Netizens pointed out bluntly that usage quotas had already been reset once when Opus 4.7 was released last week; thus, this “compensation” is essentially just a normal periodic reset.
Linking these three events together, the flavor isn’t just bugs—it’s an explosion of cost anxiety.
Personally, sudden pricing shifts without warning break developer trust faster than any bug. I think resetting quotas that were already reset feels like a hollow gesture to users.
Netizens Are Not Buying It
Given all this, reactions to Claude among netizens have begun to diverge.
Some believe that having bugs is understandable and that the postmortem was relatively transparent. Boris replied one-by-one on Hacker News, which is better than what most companies do.

But more people are calculating another bill.
During these two months, all official channels remained silent.
Only a few employees responded sporadically on X (formerly Twitter), and they were criticized for “random replies at random times,” lacking any systematic approach.

Others question the true motive behind the “caching optimization.” The trigger time for clearing thinking records coincided exactly with cache expiration nodes; some suspect this wasn’t to reduce latency, but to cut costs.
Simultaneously, Anthropic conducted A/B tests on a small subset of Pro users, quietly providing different product configurations, dealing another blow to trust.

The compensation measure was resetting usage quotas. Some bluntly pointed out that quotas had already been reset once when Opus 4.7 was released last week.
BridgeMind’s BridgeBench tests also exploded during this period, showing Opus 4.6’s accuracy dropping from 83.3% to 68.3%, with its ranking falling from #2 to #10.
Although researchers later criticized the methodology due to differing task counts in pre- and post-tests, the narrative that “Claude has become stupid” had already spread.
As one netizen said, you can’t put all your eggs in one model company’s basket.


Others stated that GPT-5.4 is already better than Opus 4.6.

Some are using MiniMax as a supplement: $40 for 4,500 messages in a 5-hour cycle, with the ability to view the full thinking process.

Half a year ago, “use Claude for coding” was almost consensus.
Now, Codex has 4 million active users. GPT-5.5 focuses heavily on coding and computer operation capabilities; even OpenAI staff directly stated that this model can be used as a “Chief of Staff.”
I’m seeing more devs treat Claude as a fallback rather than a primary editor. As a builder, the shift to Codex feels less like preference and more like necessity for many teams. Personally, miniMax offers transparency that current flagship models are struggling to match.
Claude hasn’t necessarily gotten worse. Others have gotten better, while Claude failed at the worst possible time.
The window for Anthropic to fix bugs and rebuild trust has narrowed significantly compared to two months ago.
GPT-5.5 is already released; DeepSeek V4 is firmly in place.

Gemini, come on! We’re just waiting for you.
References
I dug into these sources to verify the claims of performance decline and bug exposure, ensuring I had the primary documentation and community sentiment before forming my conclusions.
- An update on recent Claude Code quality reports — Anthropic is an AI safety and research company that’s working to build reliable, interpretable, and steerable AI systems.
- An update on recent Claude Code quality reports | Hacker News
Comments
Sign in to join the discussion and leave a comment.
Sign in with Google