Claude Code and ChatGPT Used to Challenge Human MRI Diagnosis

Models

Claude Code and ChatGPT Used to Challenge Human MRI Diagnosis

A patient used ChatGPT 5.5 Pro and Claude Code to analyze their shoulder MRI, uncovering questionable clinic treatments and a conflicting diagnosis.

AZAli Zayed · Founder & EditorJune 28, 20262 min read✓ Independently fact-checked
The quick version
  • A patient utilized Claude Code running Opus 4.8 to analyze a 266 MB DICOM medical imaging package containing hundreds of raw MRI files.
  • While a human orthopedist diagnosed a Grade III partial-thickness tendon tear, the Claude-based analysis concluded the tendon was intact with only mild tendinosis.
  • ChatGPT 5.5 Pro flagged two clinic-administered treatments—shockwave therapy on a non-calcified shoulder and a homeopathic injection—as clinically unsupported.

A patient successfully used a combination of ChatGPT 5.5 Pro and Claude Code to challenge an orthopedist’s diagnosis and treatment plan, according to a technical case study published on Antoine’s blog. By feeding a 266 MB DICOM package of shoulder MRI files into Claude Code running Opus 4.8, the patient generated an automated second opinion that directly contradicted the human doctor’s finding of a severe tendon tear.

Why it matters

The patient sought an orthopedic opinion for right shoulder pain and received an MRI, which the clinic interpreted as a “Grade III (>50%-width) partial-thickness tear” of the subscapularis tendon. The clinic immediately initiated treatments, including shockwave therapy and an injection of Traumeel. Skeptical of the rapid intervention, the patient consulted ChatGPT 5.5 Pro. According to the blog post, the model flagged both treatments: clinical guidelines advise against shockwave therapy for non-calcified rotator-cuff tendinopathy, and Traumeel is a homeopathic substance registered in Germany without proven therapeutic indications.

To analyze the raw medical imaging directly, the patient utilized Claude Code running the Opus 4.8 (xhigh) model. Unlike standard web-based AI chats, the local developer tool allowed the model to install necessary Python packages and execute code directly on the hundreds of extensionless files in the 266 MB DICOM export. After an hour of processing, Claude’s generated PDF report claimed the subscapularis tendon was completely intact, directly contradicting the human doctor’s report.

What it means for you

To reconcile the massive discrepancy, the patient ran an arbitration prompt using Claude Code. This setup deployed multiple unbiased subagents, incorporating the original human report, diagnostic physical movement tests, and ChatGPT 5.5 Pro discussions. The final AI arbiter concluded with moderate-to-high confidence that the tendon was intact, showing only mild insertional tendinosis rather than a severe tear. While this experiment highlights the analytical power of agentic workflows, it also underscores the growing utility of AI in auditing medical care. For those comparing these ecosystems for complex logic and reasoning tasks, our head-to-head test of ChatGPT vs Claude breaks down how their capabilities differ in real-world scenarios.

266 MBSize of the raw DICOM MRI package analyzed locally by Claude Code

Frequently asked questions

Can AI models like Claude read raw MRI files?

Yes. In developer environments like Claude Code, the model can install specialized Python libraries to parse, process, and analyze raw DICOM medical imaging files.

How did Claude’s MRI diagnosis differ from the human doctor’s?

The human orthopedist diagnosed a Grade III (>50%-width) partial-thickness tear, whereas Claude Opus 4.8 concluded the tendon was intact with only mild insertional tendinosis.

What treatments did ChatGPT flag as inappropriate?

ChatGPT 5.5 Pro flagged shockwave therapy (which clinical guidelines discourage for non-calcified tendinopathy) and Traumeel (a homeopathic injection with no registered therapeutic indication in Germany).

Our tested pick

Read our comprehensive comparison of ChatGPT vs Claude to see which model handles complex analysis best.

ChatGPT vs Claude (2026): Which Is Better? (Tested) →

Source: Hacker News. Published June 28, 2026.

AZ
Ali Zayed
Founder & Editor · AI Tools Worth

Ali has hands-on tested 50+ AI tools and tracks model releases daily. Every verdict here comes from real, paid usage — never vendor demos or sponsored placements.

AI Tools Worth is independent and unsponsored. Some linked guides contain affiliate links — they never change our verdicts.