The weight of history on diagnosis

Thoughts on nosology's impact on AI and human enhancement

Jan 10, 2024

What’s wrong with people? It’s the eternal question, and our approaches to answering it correspond with the major spheres of human inquiry. For engineering, what’s wrong is that we don’t have the right tools. For religion, it’s original sin, persistence in mistaking illusion for reality, that sort of thing. The humanities might blame oppression, just as economics might blame irrationality. Medicine is unique in approaching this question head-on and attempting to develop a thorough taxonomy of all the things that can be wrong with us. Rather than putting forward a unified narrative that roots our many dysfunctions in a single cause, medicine has assumed for itself the role of cataloging the many ways our bodies can go awry, all in the service of granting us longer, happier lives with greater capacity for action.

Naturally, this is doomed. Susan Sontag wrote, “Everyone who is born holds dual citizenship, in the kingdom of the well and in the kingdom of the sick.” But these borders shift all the time and are shot through with enclaves and exclaves, and little rebellions on one side or the other. Vast swathes of conceptual territory are unclaimed.

Still, at certain points along the border between illness and health stand checkpoints with guards wearing white coats and with stethoscopes draped around their necks, ready to grant legitimate passage from one land to another. The patient sits on the examination table, anxiously crinkling the wax paper beneath them and holding open their passport, ready for their interview. Some seeking entrance into the kingdom of the sick have their passports stamped with a visa in the form of a diagnosis. (As unlucky as these migrants might seem, the fates of those stateless people whose entry applications have been denied may be worse.)

In this essay, I’ll be exploring how diseases are defined and diagnosed — a field called nosology. Wellness unavoidably has a normative valence born of political negotiations and historical contingencies, so it’s not surprising that its borders are constantly under siege with no small stakes. The redefinition of homosexuality from an illness to an unremarkable variation in human desire is one of the more notable historical cases, but recent times have seen ever more incursions from one side to the other: transgender, long COVID, chronic Lyme, and aging itself mark recent boundary disputes. These push at our diagnostic categories in often fruitful ways that can, if taken seriously, help prepare us for a future of better medical care.

Essentialist bluster, symptom clusters, and germ busters

Pre-modern nosology in the West was focused on the causes of disease. This might seem ironic, since they had so little ability to act on these causes and would probably have been better off focusing on symptoms. But the inspiration was ancient, stretching all the way back to Plato and his idea of forms. Just as the essence of a triangle is to have three sides, each disease had its own essence too and reason could guide us towards discovering its true nature.

This focus on causes tended to result in goofy heuristics like the theory of humors, where health is defined as a balance between four bodily fluids, and disease occurs when there’s too much or too little of one of them. Individuals are disposed by their own essences to surpluses and deficiencies of specific humors, and thus the theory of disease gets tied to personality types — phlegmatic, sanguine, melancholic, and choleric, all of which we still use today. All of which, too, I have to look up every time I encounter them (except melancholic) because they’re no longer connected to a cohesive worldview of elements and temperatures and temperaments, but instead swim through our vocabulary like coelacanths as remnants of a long-gone time.

The same influences that guided nosology in this era also defined the ethos of other natural sciences too. This is why the history of taxonomy in biology closely parallels the development of nosology. The early taxonomists tried to discover characteristics that revealed each species’s unique essence. This meant picking a particular feature or set of features, then defining species by differences in how that feature is expressed. The botanists ran into serious trouble first: while some felt that “fructifying characters” like fruits and seeds were the best indicators of plants’ essences, others focused on patterns of growth or leaf shapes to reveal families of similar plants. None of them agreed with each other, despite their shared aim.

A slow-moving revolution kicked off in the late 17th century with John Locke, a physician as well as a philosopher, who pointed out that there was no consistent way to resolve taxonomic disagreements within the framework of essentialism. Yet nobody seemed to know how to find an alternative until 150 years later when John Stuart Mill’s work on classification met Charles Darwin’s radically line-blurring theory of evolution and the era of classification from a priori first principles ended. Its replacement aimed to distinguish different kinds of thing by observing a variety of properties that cluster across species. In biology, this meant trying to understand how anatomical features might be shared among phylogenetically related species, but in nosology meant a focus on defining specific diseases by looking at shared symptoms and prognoses.

This was the heyday of the named disease, when a physician could find a few patients with similar symptoms, publish about them in a journal, and get a disease named after them. A few of these diseases, like Raynaud’s disease, remain in our diagnostic repertoire today, but many have been grouped into other conditions. It seems in retrospect like this attempt at producing an epistemologically tractable nosology actually shifted it to a prestige-based system with even more arbitrary boundaries: “When I successfully convince [patients with indeterminate diagnoses]”, writes Robert Aronowitz, “that the absence of a simple match between their symptoms and an existing disease may be due to ‘historical accident’ and the limitations of contemporary medical knowledge and diagnostic technology, their illnesses sometimes becomes less frightening.”

Improved detection of bacterial and viral pathogens marked a turning point back towards causes that’s been intensified by the greater use of imaging and genomic sequencing in diagnosis. Today we have a hierarchy of mechanisms for defining a given disease: ideally, we’ll be able to point to a specific etiology, but this is rare outside of infectious disease or certain hereditary conditions like enzyme deficiencies; sometimes, though, a patient has a structural malformation that can be clearly labeled as the cause of some kind of suffering (think scoliosis), in which case that’s second best; third best is when their bodies malfunction in some clearly defined way (as in diabetes, say); finally, and most unsatisfyingly, diseases can be defined by clusters of symptoms, at which point they’re termed syndromes.

People trying to define a disease tend to shoot for the highest level they can in that hierarchy, but in reality, diseases move up and down it as we learn more about them. Chronic fatigue syndrome was initially called chronic Epstein-Barr virus infection, but the connection between the virus and the syndrome was much less clear than originally hypothesized. The appellation of chronic fatigue syndrome represented an enormous tumble down the hierarchy of disease definitions, though, which researchers have recently tried to climb again with the term “myalgic encephalomyelitis.” In English, this means muscle pain and brain / spinal cord inflammation — a functional definition of the disease, rather than a symptom-based one. The preference for defining disease by cooccurring properties that dominated nosology for a century has become the sole domain of diagnoses of exclusion for most physicians, psychiatrists excepted.

Defining disease and designating diagnosis

Debate about causes versus kinds and properties is all fine and good for some, but physicians have to treat suffering people and need actionable criteria for diagnoses that enable that treatment. For many diseases, this means finding a way to translate its definition into metrics that can easily be captured in a clinical setting. Alzheimer’s disease, for example, is defined as the presence of various plaques and neurofibrillary tangles in the brain. Since these features can only be detected in autopsy, physicians need criteria by which to diagnose Alzheimer’s in the living. Those criteria are terribly vague though: “insidious gradual onset” of “cognitive or behavioral symptoms that interfere with the ability to function at work or at usual activities, represent a decline from previous levels of functioning, and are not explained by delirium or major psychiatric disorder.” Guidelines specify that labs and other measurements should be used to rule out other potential causes, but it shouldn’t be surprising that the accuracy of expert geriatricians’ Alzheimer’s diagnoses is only 77%.

And you might expect that diagnosis of Alzheimer’s would be an especially tough case, given the gap between definition and diagnostic criteria, but it turns out to be only slightly more fraught with error than most other diagnoses. The authors of a 2018 analysis wrote, citing other studies as far back as 2012:

Diagnostic error occurs in 5–20% of physician–patient encounters, with a comparable prevalence among ICU admissions and patients who die in the ICU. Diagnostic error in the critically ill has traditionally been estimated from autopsies, an accepted standard for diagnostic certainty. In a systematic review of autopsy studies of adult ICU patients, Winters and colleagues found that 28% of autopsies identified at least one misdiagnosis, and estimated that 1 in 16 ICU deaths were due to lethal misdiagnoses. Furthermore, diagnostic errors comprise 9–12% of adverse safety events leading to ICU admission, and the majority of these errors are deemed preventable.

These numbers add up to indicate that most people will experience a diagnostic error at some point in their lives.

The term “diagnostic error” hides a number of subtypes, though, ultimately stemming in one taxonomy from process failures and label failures. The correct diagnosis can be missed because the physician failed to order an appropriate lab or otherwise to collect some necessary datum. Whether this happened due to deviation from standard procedures or from following them, this is an example of underdiagnosis — an appropriate diagnosis not applied. Underdiagnosis can also happen when a physician follows the correct diagnostic procedure but either fails to interpret the results correctly (their fault) or gets an anomalous result (usually not their fault). Both of these are label failures.

David Newman-Toker’s conceptual model of the types of diagnostic error

The flip side of underdiagnosis is overdiagnosis, where an appropriate diagnosis is either pursued too aggressively or applied in cases where it does more harm than good. Prostate cancer is a canonical example of overdiagnosis. The majority of elderly men have evidence of prostate cancer on autopsy, but these are indolent cancers, without consequence for the patient’s quality of life and whose treatment would result in net harm. Diagnoses can also be pursued too aggressively and result in harm through invasive workups or stress to the patient. A majority of women who follow breast cancer screening guidelines for 10 years will receive a false positive, and that experience causes fewer of them to continue regular screening, even as their increasing age puts them at greater risk for cancer.

The concept of overdiagnosis is controversial, though. Radiologists especially tend to balk at the idea, saying that a diagnosis is always appropriate if it reflects reality, but that harm actually comes from overtreatment. If someone has cancer, it’s better for that to be known, even if the balance of benefits and harms comes down on doing nothing to treat it. There’s heterogeneity, then, in how physicians treat the act of applying a diagnosis. Some see it as simply discovering a fact about reality, while others see it as constituting an obligation on their behalf to treat the condition.

What cognitive processes underlie the act of diagnosis? An idealized version can be found in the hypothetico-deductive approach, which is very much like the scientific method overall: a physician evaluates a patient and develops a hypothesis about which disease is affecting them; they then use labs, procedures, and medications to try to disprove that hypothesis. For example, if a physician sees a patient with a flaky rash on their hands, they might suspect eczema. In the hypothetico-deductive approach, their subsequent prescription of a steroid cream is an experiment that tests that hypothesis. If the patient’s condition improves, they fail to reject that hypothesis, and if it doesn’t, they move on to generating and testing a different idea about their patient’s ailment that fits the experimental evidence of steroid non-response that they’ve already gathered.

In reality, this theory fails to capture how approaches to diagnosis inevitably shift as physicians learn and practice more. For a trainee, a diagnosis often proceeds from first pathophysiological principles: abnormalities in lab values, for example, are matched to biochemical cycles that might give clues to the disease at hand. Everything is made to fit into a holistic model of bodily function and dysfunction. The scale of this model quickly becomes cognitively untenable, though, and with experience physicians tend to develop “illness scripts.” These scripts are stereotyped representations of how different diseases present themselves, including common symptoms, lab findings, and even the characteristics of patients most at risk. Diagnosis becomes a process of pattern matching that increases the risk of cognitive errors even as it makes day-to-day practice manageable. The risk is that a physician will trim away or deemphasize the features of a patient’s presentation that don’t fit the pattern, thus committing a diagnostic error.

The need to manage the cognitive load of clinical practice has opened the door for diagnostic instruments that convert the patient’s symptoms into a number and provide the physician with a threshold for diagnosis, thus offloading parts of the diagnostic process. If you’ve ever been handed a depression screening form at your primary care provider’s office, you know what I’m talking about: the patient sits checking boxes on a print-out attached to a clipboard, then hands it to their doctor, who mentally tallies the score and enters it into their computer. If that score exceeds a predefined number, the doctor will likely talk to you about your symptoms and potential ways to manage them.

There are two problems with the use of diagnostic instruments. First, the less clear a disease definition is, the more room for variation there is in diagnostic instruments. Some conditions like depression have highly contested disease definitions, and this shows in the diagnostic instruments used. Tim Dalgleish wrote in 2020:

If we look at established measurement tools for depression, there are some 280 different instruments developed in the last century, of which many are still in use. These assessment instruments differ markedly in the signs and symptoms that they capture. For instance, Fried (2017) notes that across the seven most commonly used depression assessment tools, 52 distinct depression symptoms are measured (compared with the nine symptoms listed in the DSM–5), with 40% of those symptoms appearing in just one of the seven scales and only 12% appearing across all seven.

Whether a given individual would be diagnosed with depression might come down to which instrument their doctor used. If diagnosis means anything at all, this lack of consistency should be disturbing.

The second problem with the use of diagnostic instruments is their capacity to be used by the pharmaceutical industry to change diagnostic patterns in their favor. One of the best examples of this is the depression screening form I linked to above. This form lists nine symptoms, each of which the patient is instructed to indicate how often they’ve experienced in the past two weeks. It was released in the wake of Zoloft’s introduction, when Pfizer was trying to figure out how to help doctors in primary care feel more comfortable prescribing their new antidepressant. By providing them with clear diagnostic criteria for depression, they were able to move diagnosis from the psychiatrist’s office to primary care.

Whatever good this does for improving access to mental healthcare, it’s hard not to feel like there’s something insidious about it too. It’s a conscious attempt by a profit-maximizing entity to set off an availability cascade. That is, Pfizer tried to make depression the most easily accessible diagnosis for when something was wrong with a patient’s mental health. Their work in making depression screening an integral part of wellness visits in primary care makes the possibility of a depression diagnosis ever-present.

This use of the availability cascade has the potential to push out other concerns which are also important but less available to the diagnostician. It also sets off an availability cascade for patients. As the salience of depression increases among healthcare professionals, patients tend to think of their own mental health difficulties in terms of depression. Just as pattern-matching doctors run the risk of mentally nipping and tucking at symptoms to make them fit a well-known diagnosis, patients too can play up the symptoms that most resonate with what they already think they have while disregarding the symptoms that don’t match.

Novel conditions can be the subject of availability cascades, just as existing ones can. As I was working on this piece, I happened to see a press release from the American Academy of Neurology, touting a new publication that found “Rare sleep disorder more prevalent than previously thought.” The disorder? Idiopathic hypersomnia or, in English, being very sleepy for unknown reasons. This study serves as an attempted start to an availability cascade that would end in prescribing Jazz Pharmaceuticals’ (i.e., the study sponsor’s) drug to the increasing number of people diagnosed with idiopathic hypersomnia.

Of course, pharmaceutical companies aren’t the only entities capable of starting diagnostic availability cascades. A single individual created Morgellons, apparently out of thin air, then managed to convince others that they had the supposedly tell-tale fibers emerging from their skin. At its height, the cascade even swept over Joni Mitchell, who at one point said, “Fibers in a variety of colors protrude out of my skin like mushrooms after a rainstorm: they cannot be forensically identified as animal, vegetable or mineral,” and, “I haven’t been doing much lately because I’ve just come through about seven years of a flattening kind of illness.” The main organization for work on Morgellons shuttered in 2012 and seems to have been merged into the advocacy machinery of another contested illness, chronic Lyme disease.

Availability cascades are also a common thread in detransitioners’ stories: they feel dissatisfied with their lives, are struggling mentally, and then become aware of the possibility that they’re trans. The opportunity to find a way out of their troubles seems tempting enough that they recast their whole life story in transgender terms. They’re so convinced by this that they undergo major surgeries, alter their hormonal makeup, transition socially — all good things for people who emerge from these treatments happier. But if someone does this only because they were caught up in an availability cascade, it demonstrates the potential costs of increasing a given condition’s salience.

An interlude on psychology’s diagnostic dilemmas

If it seems like I’ve been picking on psychology, well, you’re right. No other field of medicine has done quite as much hand-wringing about how they diagnose and yet achieved so little agreement. Relatively, other specialties have it easy: infectious disease docs just find the pathogen, oncologists find the cancer, dermatologists pattern match to a specific skin condition. Obviously, this is all easier said than done and each field has its challenges, but diagnostically everyone is miles ahead of psychology.

In part, this is because psychological disorders rarely have a clear biochemical etiology. There are theories, of course, but they’re quite remote from diagnosis — you can’t just tunnel into every patient’s brain and measure their neurotransmitters, but even if you could, it’s not clear that the disease definitions are refined enough to provide more information than what can be gleaned from symptomatic presentation. You can see the grasping at straws for a mechanistic basis of mental illness in the animal models of disease used in psychology. Mouse models of schizophrenia are generated through a broad array of prenatal toxins, or simply hammering grown mice with amphetamines until they lose their minds. Each of these results in mice that seem plausibly schizophrenic, but the fact that so many different mechanisms can result in a similar outcome seems like a depressing portent for psychologists trying to uncover etiologies that can be acted upon with drugs. For depression, one of the main animal models is the forced swim test, which is as nauseatingly cruel as it sounds and begs the question of what exactly it’s trying to prove.

The futility of this search for biochemical origins has caused several psychologists and psychiatrists to call for its end. Most famously perhaps, Thomas Szasz contended that, since the mind is constructed from concepts and abstractions, it couldn’t be literally ill — a term he reserved for maladies with a clear physical cause. Instead, our notions of mental illness are derived from a metaphor of bodily disease that can mislead us if we take it too seriously. The people we would generally describe as having mental illness should more aptly be described as having “problems in living” that arise from a conflict between the patient’s cognitive patterns and the expectations of the societies in which they live. This was not only a rebuke of the biochemical model of mental illness, but also an attempt to halt psychology in its attempt to pathologize ordinary behaviors the treatment of which it also monopolizes. Our concept of disease involves drawing a line around some range of phenomenon that then becomes the norm. Deviations from that norm — whether physical, functional, or behavioral — are then designated as diseases. Sadness and anxiety are a part of every sufficiently long life, for example, but there comes a point at which people are too sad or too anxious. Then, these emotions suddenly become diseases and fall under the exclusive domain of medicine. The arbitrariness of this point is one of the things I see Szasz as rebelling against.

Other thinkers have expanded Szasz’s point about disease being a mismatch between the individual and society’s expectations to include disability as well. The social model of disability contends that individuals with physical, sensory, or cognitive differences are not disabled by those conditions but rather by a discriminatory lack of accommodation from the rest of society. This is both absurdly overstated and has a kernel of truth to it. Raiany Romanni points out that an infertile woman would no longer be diagnosed with any disease if it became the norm to gestate our young in artificial wombs. This is because the medical establishment would have no reason to notice her inability to become pregnant, although it would still notice when someone had lost the use of their legs: that is, even if all physical barriers to wheelchair use were removed, some diagnosis would help ensure that the patient received the accommodations they needed.

For Szaszian reasons or not, psychology has tried to get its house in order. Attempts to usher in a more empirical approach with the DSM-V fell flat, while the inertia of past categories prevailed. Diagnostic criteria for diseases were designed by committee to satisfy a broad array of interested parties, even when those criteria made little sense. Dan Ackerfeld writes:

For a diagnosis of Borderline Personality Disorder, a person must show at least five of a possible nine symptoms - this means that two individuals could meet criteria for the same diagnosis with only a single overlapping symptom. If two people with the same diagnosis present in almost entirely different ways, what sense is there in giving them the same label?

And so, rather than carrying forward diagnoses with increasingly tenuous criteria, researchers in psychology have attempted to identify naturally occurring clusters within a multidimensional symptom space. The Hierarchical Taxonomy of Psychopathology (HiTOP) came out in 2017. (Not coincidentally, I suspect, the word2vec paper that demonstrated multidimensional embeddings for words came out in 2013.) Its creators reimagined psychological nosology as a hierarchical clustering within a hyperspace in which each dimension represented a single symptom. Rather than representing symptoms in a binary fashion such as might be the result of a thresholding process, HiTOP uses a continuous scale of severity for each one. While its creators list a set of meta-categories for different mental illnesses, it’s not clear to me as a non-expert in psychology how much original insight has actually emerged from the use of HiTOP. And it’s not clear to me either how such insight would be used by the clinicians actually prescribing many of the drugs and other treatments. Those practices have inertia, and as patients are plotted onto the embedding space of HiTOP, it’s hard not to imagine the clinicians mapping the patients’ symptoms onto existing diseases.

Hierarchical Taxonomy of Psychopathology (HiTOP)

ICD Odysseys

Once a clinician — psychologist or otherwise — arrives at a diagnosis for their patient, they document it in a standardized language, the most popular of which is the International Classification of Disease (ICD), currently in its 11th iteration (although the US continues to use ICD-10, which continues to be updated annually) . The current version gives clinicians almost 55,000 diseases to work from — about four times as many as the previous one. That, in turn, was a far cry from the very first version of the ICD, which was created in 1853 and included just 139 conditions. The initial ICD system was entirely focused on causes of death, as were its first five revisions. In 1948, non-fatal disease was finally included in the ICD and oversight for it was transferred to the World Health Organization.

The selection of diseases classified within the ICD has meaningful impacts on every part of the healthcare system. For patients, being assigned a diagnostic code makes it easier to handle insurance for appointments and treatments. The diagnostic code also allows physicians to be reimbursed for ongoing services that might be difficult to justify in their absence. And for public health workers, these codes allow for tracking of cases.

At the same time, updates to the ICD can lead to misunderstandings. In 2019, there were reports that teen mental health had suddenly worsened. These studies were based on tracking ICD codes that showed a wave of suicide attempts starting around 2016. A working paper that came out in 2023, though, observed that the shift in teen mental health coincided with the switch from ICD-9 to ICD-10, which had brought with it a change in how overdoses were coded. In ICD-9, physicians could record an overdose without specifying intentionality. This option was removed in ICD-10, when it became mandatory to select whether the overdose was intentional or unintentional. It seems clear in retrospect that many intentional overdoses had instead been coded without the modifier for intentionality that was optional in ICD-9. The switch to ICD-10 therefore gave the appearance of a drastic and sudden increase in teen suicide attempts.

Another challenge with the ICD system is that updates have to be rolled out first by each electronic health record manufacturer and then at the local level. This can introduce delays and the way in which these updates are implemented can create unpredictable idiosyncrasies. Clinicians don’t memorize ICD codes, but instead use a search field, then select conditions from a drop-down list. A physician once told me that when he searched for “pneumonia” in the electronic health record implementation at his main practice, the first entry would be “left upper lung pneumonia.” Unspecified pneumonia was farther down the list and required digging that wasn’t always possible in the middle of an appointment. So, he and his peers would select “left upper lung pneumonia”, perhaps intending to fix it later but surely not always doing it. A public health researcher without some awareness of how ICD codes are implemented might puzzle over the sudden outbreak of left upper lung pneumonia.

There’s a type of ICD code that deserves attention because it blurs the definition of diagnosis. Z codes are a subset of the ICD-10 that focus on “factors influencing health status and contact with health services” rather than on a specific disease. Among the Z codes you’ll find conditions like Z14.0 for hemophilia carriers, Z18.31 for fragments of animal quills or spines embedded in the body, or Z19.1 for noncompliance with medical treatment. It’s well known that Z codes are under-recorded, simply because they don’t unlock any reimbursement for the clinician or extra resources for the patient. But I still think it’s remarkable that a patient can be diagnosed with noncompliance. This blurring of the line between disease and the contributors to disease opens up the possibility of an infinite regress: how long until we see an ICD code for hyperbolic discounting? Or could a doctor someday diagnose you as having soldier mindset? Health is so broad and the product of so many decisions or influences that there’s potentially no end to what could get documented with a diagnostic code.

Diagnostic multiverses

We can build on this observation about the endless potential of clinical documentation to develop a tentative meta-theory about the nature of diagnosis. Specifically, instead of diagnosis as reporting some fact about reality, it’s actually an attempt at enclosing some space of life under the domain of medicine. This is the Foucauldian reading of diagnosis. When a clinician diagnoses someone with a given disease, they are claiming that the proper way to manage that person is through the medical system and that the individual needs to submit a non-trivial portion of their life to examination and treatment by the medical system. They have to adopt practices prescribed by that system such as taking medications or measurements of their bodily function. Similarly, when we seek diagnosis, it’s an attempt to hand over the management of some aspect of our lives to the medical sphere, which we assume (often correctly!) can handle it better than anyone else.

Diagnosis sets off a chain of permissions for patients and the medical system to act in ways that they ordinarily wouldn’t be able to. People with certain diagnoses are able to bring animals into places where they’re not normally permitted. In much of the US, people can’t legally use cannabis without a prescription that of course follows from a diagnosis. Other diagnoses, though, enable medical professionals to act on people against their stated preferences, such as when they’re under a psychiatric hold. Children can be treated in the medical system against their parents’ wishes so long as they have certain diagnoses. It’s a strange mix of freedom and restraint, but those are the laws in the kingdom of the sick.

There are a number of medical interventions that don’t come with a diagnosis, though. Cosmetic surgery doesn’t really bother with diagnoses. To get a nose job, the plastic surgeon doesn’t have to first translate “dissatisfaction with your nose” into Greek, then solemnly record “rhinothymia” in your chart. They just make sure it’s safe to do the surgery and, if so, they go forward with it, no diagnosis necessary.

So maybe diagnosis exists as a way of not just claiming part of life as the domain of medicine, but also claiming that the costs of treatment for that diagnosis should be socialized. Few cosmetic procedures are covered by insurance, and I think most people agree with that. At the same time, there are plenty of edge cases where a diagnosis is used as a rhetorical tool to get more patient needs covered. For example, the practice of prescribing fresh food has existed for decades as a treatment for diabetes (with still-uncertain benefits), but newer examples include prescribing an air conditioner for people with heart failure. The logic goes that if we’re willing to have society pay for a pill that treats a disease, we ought to be willing to pay for a non-pharmaceutical intervention that would be at least as successful. Health can’t be delineated from other domains of life so easily, though, which is why we’ve even seen doctors attempt to prescribe money for their patients following a diagnosis of poverty.

In this way, diagnosis can function much like a human right: it’s a rhetorical move to add authority to a bid for somehow changing society. This sets off a negotiation between the public and the medical profession that can end in lower trust by way of emotional support peacocks. It’s like if I claimed that there was a human right to at least one slice of vegan cheesecake per day. People would rightly (no pun intended) think I wasn’t being serious. If enough people felt like the best way to satisfy their whims was to treat the satisfaction of those whims as a human right, it’d cheapen the whole discourse to the point that we’d have to find some other way of talking about what we’ll collectively protect and provide as a society. Similarly, clinicians have to be wise about not using their social capital unwisely, which acts as an incentive for a certain diagnostic parsimony. This explains why physicians tend to resist contested diseases like chronic Lyme, Morgellons, and when it first surfaced, long COVID.

But there’s still another case of medical interventions where diagnosis isn’t needed and the costs are socialized — public health. I don’t need a diagnosis to get fluoridated water from my tap just like everyone else, and I don’t need a diagnosis to get a flu shot or a COVID booster. In this case, it’s not that clinicians are worried about squandering the public’s respect, but rather that the diagnosis would be almost tautological if it’s a diagnosis that we all have. If we wanted to be thorough, we could say that there’s a tacit diagnosis motivating each public health intervention: “susceptibility to cholera” might motivate water sanitation; and “susceptibility to flu” would be the obvious choice for that vaccine.

What’s at stake?

Flu and cholera are easy to define as diseases, because people acutely decline when they’re infected. Aging has proven tougher to treat as a disease in regulatory and clinical settings. Rather than affecting your health directly, aging is primarily a threat because it increases your risk of multiple chronic and acute conditions. As Andrew Steele wrote in Wired, “Having high blood pressure doubles your risk of having a heart attack; being 80 instead of 40 multiples your risk by ten.” Yet there are dozens of blood pressure drugs on the market and not a single one approved to treat the indication of aging.

In large part this is because blood pressure is easy to measure and has a known association with adverse outcomes. Like aging, high blood pressure is common, yet doesn’t directly affect your health or functioning: that is, nobody feels bad because they have high blood pressure, they feel bad because their blood pressure has caused the chambers of their hearts to distend, leading to heart failure, or they feel bad because their blood pressure has caused an especially weak vessel in their brain to pop. Without those risks, having high blood pressure would be no more relevant than being left-handed.

Unfortunately, we’ve treated aging as if it had no relevance, and much of that is because we haven’t had a good way to measure it. Waiting for the development of chronic disease is inefficient, and even if we found an effect, it wouldn’t be clear that it was through mitigating aging. We’re stuck waiting on the development of biomarkers for aging before we can treat it as a disease. Epigenetic clocks have been created for this purpose, but there’s a lot of refinement that needs to happen before they can be as reliable as a blood pressure measurement. Not only do they tend to swing around quite a bit between measurements, but we’ve found that organs age at different rates, with unclear consequences for the patient’s risk of serious disease. Of course, there’s inertia for not considering aging a disease. Millennia of cope has been built into our culture, telling us that getting old and falling apart is just ~*~natural~*~ and if we were to suddenly declare that a pathology, there’d be a real risk of biomedicine’s losing status. For now, at least.

Another reason why the confusion around diagnosis matters is that we’ll soon begin offloading increasing amounts of diagnostic work to AI. We need high-quality training data, though, and when there are high levels of disagreement between clinicians or just plain diagnostic error, it has substantial impacts on the potential utility of the AI we develop. As an example, I’ve got a work in progress on long COVID. Out of almost 4,000 clinicians in our data, all in primary care, five clinicians accounted for 13% of all long COVID diagnoses. Three of these diagnosed more than 10% of their patients with long COVID! At the same time, 80% of clinicians didn’t diagnose a single case. If we’d used these data to train an AI to diagnose long COVID, it’s fair to assume that our AI would’ve been useless. This is an extreme example, but diagnostic error is clearly a problem that’s both widespread and hard to detect.

And yet, it seems like a lot of folks in different areas of research and medicine want to charge ahead with diagnosis as the primary application of clinical AI. Part of this is likely because the task fits so neatly into the classification framework, but part is that diagnosis is the central activity of medicine: evaluation is done with the aim of arriving at a diagnosis, and having a diagnosis enables rational treatment decisions. Improving the quality, speed, and cost of diagnosis would make the whole medical endeavor a lot more efficient.

It’s a good aim, but premature. To move towards this goal, the first step is to use AI to improve documentation and to suggest diagnostic tests. The need to document care is a major cause of burnout for physicians — no surprise when over 17% of family physicians say that they spend 4 or more hours a day after clinic hours on documentation. Having AI support documentation could relieve some of that burden and free clinicians to do more valuable types of work. Delayed or missed diagnostic testing is among the most common and most harmful contributors to diagnostic error. AI-based decision support could ensure that testing is guided by objective measures of risk and provides the information needed for a reasonable diagnosis.

These uses for AI would be helpful in themselves, but I think it’s more interesting to imagine how they could support the eventual aim of shifting most diagnosis to AI. I envision using counterfactual analyses to clean the training data for diagnostic AI: that is, to produce a dataset of diagnoses as if they were all done by a single person of exceptional ability. The counterfactual model would eliminate the variation in training data that would mess up AI, but would need assessment data such as notes and labs to be relatively standardized. That’s where AI comes in: by documenting care and suggesting the use of diagnostic tests, AI could decrease the heterogeneity that currently exists between patients and which would make that counterfactual model harder to manage. Only by ensuring high-quality data will we be able to have performant diagnostic AI, and we shouldn’t discount the role of AI in helping to generate that data.

Perambulations

Discussion about this post