374 Comments
User's avatar
тна Return to thread
deathcap's avatar

So here's my question about that study: how did they determine what the "correct" diagnosis is to compare the test physicians vs ChatGPT?

Expand full comment
Mitch's avatar

post mortems, lol

Expand full comment
Camilla's avatar

They likely used already diagnosed cases as the scenarios.

Expand full comment
deathcap's avatar

Diagnosed by humans with the same error rate? How do we know that the baseline diagnoses were correct to begin with? If we're at the level of subtlety in which an AI system can better infer what's wrong that a person -- like a constellation of vague GI complaints rather than something obvious like a broken tibia -- is the baseline data deemed reliable enough to be worth comparing to?

Basically: we're comparing AI and humans against a model of scenarios that were created by humans. I dunno, I didn't dive too deeply into the study itself, but I'm always wary of data reliability.

Expand full comment
Craig's avatar

***Thank you for saying this.*** I had the same exact thought.

Expand full comment
Metta Zetty's avatar

Excellent point, DC.

Expand full comment
Mumsy5's avatar

And therein lies the problem. AI only does well with a canned question for which there is limited data leading to only one answer. Real life isn't like that.

Expand full comment
Just a Clinician's avatar

Scary part is - what if they used ICD10 codes? We all know these are gamed to get insurance to pay.

Expand full comment