Artificial Intelligence in
Medical Diagnosis

WHEN THE MACHINE SURPRISES

What does it mean, today, to rely on Artificial Intelligence for a medical diagnosis? In some areas the machine truly surprises, managing to identify and interpret what human natural capabilities struggle to perceive. In others, its structural limitations emerge with equal clarity. What is certain is that neither the doctor, nor the machine, alone, are enough. Let’s understand why.

( by: Antonio Maria Guerra | date: 16/04/2026 )

When AI enters the field of medical diagnosis.

Among all the functions that make up medical practice, diagnosis occupies a uniquely important place: it is the intellectual act through which a professional interprets often ambiguous signals, comparing them against a body of knowledge built over years of study and experience, and ultimately formulating a judgment on which the patient’s entire care journey will depend. A responsibility of no small weight — and, in every sense, the heart of medicine. It is precisely for this reason that the question Artificial Intelligence has brought with it since its very emergence, whether a machine can ‘diagnose’ better than a human being, is not an academic provocation, but a concrete issue that today concerns thousands of professionals, not to mention their patients. The answer is neither a simple yes nor a simple no: in some areas the machine genuinely surprises, in others human judgment remains irreplaceable.

The AI that ‘examines’ the patient before and better than the doctor?

Since Artificial Intelligence entered people’s daily lives, testimonies have been multiplying from patients who claim to have received from a language model a diagnostic hypothesis that doctors had been unable to formulate. Among the most documented cases is that of Bethany Crystal, a New York consultant who found herself with red spots on her legs, without several doctors being able to explain why (*1). After describing her symptoms to ChatGPT, she received a precise indication: bleeding risk to be evaluated immediately.

The final diagnosis was ‘immune thrombocytopenic purpura’: a condition that causes a dangerous reduction in blood platelets. Crystal stated that without ChatGPT’s insistence she probably would not have gone to the emergency room in time.
A case that gives pause for thought: surprising for what it demonstrates, but which also raises an uncomfortable question: for every diagnosis that AI gets right, how many does it get wrong or lead astray, sending people down dangerous paths? A question that will remain valid for a long time, reminding us that the potential and the risk of using technology in medicine always travel together.

Note:
*1: Katia Riddle, NPR, January 30, 2026.

Where AI already excels: the revolution in diagnostic imaging.

If there is one medical field in which the use of Artificial Intelligence is already producing remarkable results, diagnostic imaging is without doubt the most striking. Radiology, mammography, dermatology, ophthalmology: in these fields, Deep Learning algorithms are demonstrating a capacity for visual pattern recognition that, under specific conditions, surpasses that of the human eye. After all, the numbers speak for themselves. One example above all: in breast cancer detection, some AI systems achieve a sensitivity between 76% and 90%, outperforming the average performance of radiologists, historically ranging between 73% and 78%. The MASAI randomized trial, one of the most authoritative studies conducted in this field, compared the readings of two human professionals with those of a single specialist supported by an AI algorithm: the results led The Lancet to openly speak of a new clinical standard for mammographic screening.

This is not, however, a case of absolute superiority: studies show that doctors still maintain a significant advantage in diagnosing dense breast tissue, where image complexity reduces the effectiveness of algorithms. What emerges is therefore a picture of two intelligences, biological and artificial, with complementary strengths: one more precise in high-volume standard cases, the other more capable of capturing nuance in complex ones.
A complementarity that is the common thread running through the relationship between AI and healthcare (*1).

Note:
*1: The advantage of AI in diagnostic imaging can be explained by a structural characteristic: models are trained on millions of annotated images — a volume of ‘visual experience’ that no human specialist could accumulate over the course of an entire career.

The ECG in the age of AI: seeing the invisible.

The electrocardiogram, or ‘ECG’, is without doubt among the most common tests in modern medicine: in just a few minutes it records the heart’s electrical activity, providing the doctor with a tracing that, traditionally, requires an expert eye to be interpreted correctly. Today, however, something new is happening: Artificial Intelligence is able to detect in that same tracing details that are barely perceptible to a ‘simple human’. A telling example is a randomized clinical trial published in 2024 in Nature Medicine (*1), conducted on nearly 16,000 hospitalized patients. The trial demonstrated that an AI system, after examining their ECGs, was able to identify people at high risk before their conditions deteriorated visibly, allowing them to be treated in time. The figure is significant: in the group assisted by the new technology, mortality fell from 4.3% to 3.6% — a reduction that, in plain terms, translates into a 17% lower risk of death. It is hard, faced with results like these, not to speak of a true diagnostic revolution.

Note

*1: Lin, C.S. et al. (2024). AI-enabled electrocardiography alert intervention and all-cause mortality: a pragmatic randomized clinical trial. Nature Medicine, 30, 1461–1470.

Rare diseases: AI and the end of a ‘diagnostic odyssey’.

Not everyone knows that there are currently more than 7,000 rare diseases in the world. Most of them are genetic in origin, manifest with atypical and complex symptoms, and affect such a limited number of people that even the most experienced doctor may never have encountered them. This explains why, for those affected, the path to a diagnosis can be a true ‘odyssey’: years of appointments, tests, consultations with different specialists, and hypotheses that consistently prove wrong.

According to the scientific literature, the average time between the onset of symptoms and a correct diagnosis can reach five years (*1). A wait that is not only frustrating, but that in many cases prevents timely access to available treatments, with inevitable consequences for health. Artificial Intelligence is proving capable of dramatically shortening this painful journey. A study published in August 2025 in JAMA Network Open by researchers at Vanderbilt University Medical Center, conducted on 90 of the most complex and unresolved cases in the American Undiagnosed Diseases Network, supported by the National Institutes of Health, found that AI language models achieved diagnostic rates of 13.3% and 10%, compared to a historical clinical review rate of 5.6%. In practical terms: the machine identified the correct diagnosis in more than double the proportion of cases compared to the conventional clinical pathway, starting from the same available data. For patients who had been waiting years for an answer, this is an extraordinary result.

Note:
*1: Faye F. et al., European Journal of Human Genetics, 2024.

AI in colonoscopy: an ally that saves lives.

Colorectal cancer is among the most common and deadly cancers in the world… yet it is also one of those for which prevention truly works: identifying and removing adenomatous polyps before they degenerate means, in the vast majority of cases, averting the disease altogether. Colonoscopy is, in practice, the procedure of choice for this purpose, but it has an important limitation: even in the hands of an endoscopist with years of experience, a significant proportion of abnormalities escape detection.

Artificial Intelligence is changing this reality in a measurable way. A meta-analysis published in 2025 in Gastrointestinal Endoscopy (*1), based on 28 randomized controlled trials involving nearly 24,000 patients, found that AI systems applied to this specific procedure increase the adenoma detection rate by 20% and reduce the rate of missed lesions by 55%, compared to standard colonoscopy. A remarkable result. That said, it’s important to emphasize that the machine must absolutely not replace the endoscopist, but rather work alongside them in real time, instantly flagging suspicious areas that might escape the human eye. A silent ally, then, that quite literally saves lives.

Note:
*1: Meta-analisi: Use of artificial intelligence improves colonoscopy performance in adenoma detection: a systematic review and meta-analysis Gastrointestinal Endoscopy, 2025

Diabetic retinopathy: a diagnosis without the doctor.

Diabetic retinopathy is one of the leading causes of ‘preventable’ blindness in the world: the condition develops silently in people with diabetes, progressively damaging the blood vessels of the retina, often without the person noticing any symptoms until the damage is already advanced. Early diagnosis, as is easy to understand, is therefore crucial… yet it requires a specialist examination of the ocular fundus that, in many contexts, is not easily accessible.

It’s precisely in this area that Artificial Intelligence has enabled an innovation that until recently seemed unthinkable: in 2018, the US FDA authorized an AI system capable of diagnosing diabetic retinopathy without any medical supervision, the first of its kind in history, in any field of medicine. This system, marketed under the name LumineticsCore, analyzes photographs of the ocular fundus and provides a diagnostic response in under a minute, without the presence of an ophthalmologist being required (!). A meta-analysis published in 2025 in the American Journal of Ophthalmology, based on 13 studies and over 13,000 patients, confirmed that this remarkable tool achieves a sensitivity of 95% and a specificity of 91% in detecting the condition. In plain terms: the machine identifies the disease in 95% of cases where it is present, with an extremely low margin of error.
A surprising result that will make it possible to bring quality diagnosis to general practice settings, rural areas, and developing countries, wherever a specialist is absent. For millions of people, this will mean the difference between seeing and not seeing.

The voice that reveals the disease … thanks to AI.

Imagine a diagnostic test that requires no blood tests, no expensive equipment and no long waiting times, just your voice. This is not science fiction. Researchers at the Luxembourg Institute of Health demonstrated, in a study published in PLOS Digital Health in 2024, that an AI system is capable of detecting type 2 diabetes by analyzing short recorded voice clips, with a detection rate of 71% in men and 66% in women.

Diabetes, in fact, causes physiological changes that subtly alter the acoustic characteristics of the voice, in a way imperceptible to the human ear but not to the machine. A frontier still under development, but already remarkable (*1).

Note:
*1: Elbéji A. et al., PLOS Digital Health, 2024.

‘Hallucinations’: when AI ‘gets it wrong with absolute certainty’.

If the lack of context represents a structural vulnerability of AI, there is a second limitation that is equally insidious and, in the diagnostic field, assumes a particularly critical significance: the ‘hallucinations’. This term refers to the tendency of language models to generate incorrect information while presenting it with the same confident, authoritative tone they would use to deliver an accurate response. In medicine, as is easy to understand, this represents a considerable problem: an error communicated with the same conviction as a real piece of data can lead the patient, or even the professional, to accept it without the necessary critical filter.
Scientific research has documented the problem extensively. A 2025 study conducted by researchers at MIT and Harvard Medical School, currently undergoing scientific peer review, evaluated eleven language models, both general-purpose and healthcare-specialized, on real clinical cases.

Remarkably, every single system examined generated erroneous information autonomously, with particularly high rates, interestingly enough, in the rarest or most atypical cases, precisely those in which accuracy would be most indispensable.
The picture is clear: the more complex a clinical case, the greater the risk that the machine will produce an error and, what is worse, present it as ‘certain’ without hesitation.
A doctor, made of flesh and blood, can have doubts, seek confirmation and, if necessary, reassess. A machine that gets it wrong doesn’t know it and, worse still, gives nothing away.

AI’s problem: it doesn’t truly ‘know’ the patient.

There is a ‘structural’ limitation of Artificial Intelligence in medical diagnosis that no algorithmic update has yet managed to resolve and that, in practice, represents an enormous vulnerability. It is the technology’s inability to truly ‘know’ the patient. A system of this kind, however sophisticated, works almost exclusively on the data it receives: if that data is incomplete or lacks the context needed to interpret it correctly, the result will be not only inaccurate, but misleading and therefore dangerous, particularly in a healthcare setting.

A telling example of this problem is documented in a 2024 case published in an Austrian medical journal (*1): a 63-year-old man who had recently undergone pulmonary vein ablation consulted ChatGPT about his neurological symptoms without mentioning the procedure he had just had. Deprived of such a crucial piece of information, the system produced an assessment so incomplete that it delayed the correct identification of a transient ischemic attack. Further confirmation of this AI’s ‘weak point’ comes from a study published in 2026 in JAMA Network Open: when tested on 29 real clinical cases with partial information, 21 of the leading language models available failed in over 80% of cases to generate an appropriate differential response.

The best-performing models, given complete documentation, exceeded 90% accuracy.
The conclusion is unequivocal: an AI that has no knowledge of a person’s prior medical history, ongoing therapies, living conditions and emotional variables (which an experienced doctor can read even without being explicitly told), can and does make mistakes.
Context, especially in medicine, is not an accessory element: it is generally an essential step in transforming a raw piece of data into an accurate assessment.

Note:
*1: The journal ‘Wiener Klinische Wochenschrift’.

AI vs doctor: what data really say.

Beyond the evident superiority of AI-driven diagnostic imaging, the data on the broader performance comparison between this new technology and human doctors tells a decidedly more nuanced story. The most recent research, conducted across thousands of cases, paints a far from clear-cut picture. For instance, a 2025 report published in npj Digital Medicine, a peer-reviewed journal from the Nature group, found an overall AI diagnostic accuracy of 52.1%, with no statistically significant difference compared to the performance of non-specialist physicians. A second study, led by researchers at Osaka University, showed that results vary enormously depending on the medical specialty involved, the type of clinical case, and the reliability of the information provided to the system.

What emerges clearly is that AI is not infallible: it excels only when supplied with a large volume of structured, complete and dependable data, while it tends to lose ground in contexts where clinical reasoning requires the integration of variables that cannot easily be quantified — such as the patient’s personal history, their life circumstances, or the nuances of the medical consultation. The data, in short, does not crown an absolute winner between machine and human: it describes instead a technology still dependent on the accuracy of what it receives, and a human doctor who, precisely where information grows more complex, continues to ‘make the difference’.

Humans and machines, together: the only formula that truly works.

Having explored some of the most extraordinary results that the introduction of Artificial Intelligence in diagnostics is producing, the structural limitations that characterize its use, and the challenges that still lie ahead, one question remains open: what is, concretely, the way of using this new technology that works best? Today, science provides the answer. A study published in 2025 in the Proceedings of the National Academy of Sciences, conducted by an international team led by the Max Planck Institute for Human Development, analyzed over 40,000 diagnoses across more than 2,100 clinical cases, comparing the performance of doctors, AI language models and mixed collectives. The result is clear: hybrid collectives, made up of human experts and AI systems working together, outperform both doctors alone and machines alone.

After all, humans and AI make different mistakes: when the machine gets it wrong, the doctor often knows the correct answer, and vice versa. This complementarity is the key to everything: it’s not a matter of choosing between the professional and the technology, but of making them work together. Artificial Intelligence will therefore not replace the doctor, with their experience, empathy and clinical judgment, but will work alongside them, expanding their capabilities and reducing the margin of error. A partnership, not a substitution: it is on this awareness that the future of medical diagnosis is founded.

AI in diagnosis: the challenges that still lie ahead.

Despite the fact that many of the results produced by AI in the diagnostic field are genuinely remarkable, it would be wrong to paint a future entirely free of shadows. Several concrete challenges stand between this new technology and its regulated, fully realized use. The first concerns legislation: to date, the US FDA has authorized around 950 medical devices based on AI algorithms, at a rate of approximately 100 new approvals per year, yet the regulatory framework is still very much in flux, and ensuring that every system is safe, validated and monitored over time remains an open challenge. In Europe, the AI Act that came into force in 2024 has laid the groundwork for more structured oversight, but its full implementation will take years.

The second challenge concerns the ‘algorithmic opacity’: many AI systems produce correct results without being able to explain the reasoning that generated them, making it difficult for the doctor to assess their reliability. The third, perhaps the most insidious, is that of bias: models trained on data that does not adequately represent the global population, in terms of age, ethnicity and socioeconomic conditions, risk producing less accurate diagnoses for the most vulnerable categories of patients. Finally, there is the problem of technical integration: most hospitals around the world still operate with outdated and poorly interconnected IT systems, making it difficult to adopt advanced AI tools in everyday clinical practice.
Addressing these issues is the necessary condition for the potential of Artificial Intelligence in medicine to translate into real benefits for all patients, not only for those fortunate enough to have the economic means to access centers of excellence.

The images on this page were created using generative Artificial Intelligence tools.

Artificial Intelligence inMedical Diagnosis