Diagnosed by humans with the same error rate? How do we know that the baseline diagnoses were correct to begin with? If we're at the level of subtlety in which an AI system can better infer what's wrong that a person -- like a constellation of vague GI complaints rather than something obvious like a broken tibia -- is the baseline data deemed reliable enough to be worth comparing to?
Basically: we're comparing AI and humans against a model of scenarios that were created by humans. I dunno, I didn't dive too deeply into the study itself, but I'm always wary of data reliability.
Diagnosed by humans with the same error rate? How do we know that the baseline diagnoses were correct to begin with? If we're at the level of subtlety in which an AI system can better infer what's wrong that a person -- like a constellation of vague GI complaints rather than something obvious like a broken tibia -- is the baseline data deemed reliable enough to be worth comparing to?
Basically: we're comparing AI and humans against a model of scenarios that were created by humans. I dunno, I didn't dive too deeply into the study itself, but I'm always wary of data reliability.
***Thank you for saying this.*** I had the same exact thought.
Excellent point, DC.