374 Comments
User's avatar
⭠ Return to thread
deathcap's avatar

Diagnosed by humans with the same error rate? How do we know that the baseline diagnoses were correct to begin with? If we're at the level of subtlety in which an AI system can better infer what's wrong that a person -- like a constellation of vague GI complaints rather than something obvious like a broken tibia -- is the baseline data deemed reliable enough to be worth comparing to?

Basically: we're comparing AI and humans against a model of scenarios that were created by humans. I dunno, I didn't dive too deeply into the study itself, but I'm always wary of data reliability.

Expand full comment
Craig's avatar

***Thank you for saying this.*** I had the same exact thought.

Expand full comment
Metta Zetty's avatar

Excellent point, DC.

Expand full comment