Diagnosis: Making the Best Use of Medical Data

MARK H. EBELL

MARK H. EBELL, MD, MS

Am Fam Physician. 2009;79(6):478-480

Author disclosure: Dr. Ebell is a consulting editor for John Wiley and Sons, Inc., publisher of Essential Evidence Plus.

To take the best possible care of patients, physicians must understand the basic principles of diagnostic test interpretation. Pretest probability is an important factor in interpreting test results. Some tests are useful for ruling in disease when positive or ruling out disease when negative, but not necessarily both. Many tests are of little value for diagnosing disease, and tests should be ordered only when the results are likely to lead to improved patient-oriented outcomes.

Although evidence-based medicine is often associated with randomized controlled trials and treatment decisions, the past 20 years have seen an explosion in our knowledge about diagnosis. New tests, such as the brain natriuretic peptide (BNP) and d-dimer tests, have been developed, and physicians have better data on older tests and on the history and physical examination.

Adopting New Tests

New tests are usually described in terms of their sensitivity and specificity. A sensitive test is good for detecting disease when it is present, whereas a specific test is good for identifying the absence of disease in healthy patients. But there are several other important factors that make a test worth adopting, including cost, availability, and the potential for harm. Most importantly, does the information help physicians take better care of patients and improve patient-oriented outcomes? Knowing with greater certainty that a patient has a disease is helpful only if this knowledge leads to an improvement in treatment that increases how long or how well the patient lives. Tests can be harmful when they lead to unnecessary invasive procedures or unneeded worry. For example, if an older patient who smokes presents with dyspnea of uncertain origin, the physician might consider electrocardiography (ECG), echocardiography, radiography, and BNP measurement. Should all four tests be ordered? Which ones merely add cost without improving patient-oriented outcomes? In this case, a study in several European emergency departments found that use of the BNP test in the setting described above reduced the length of hospitalization and saved money.¹ Although chest radiography and ECG probably should be ordered, an echocardiogram isn’t necessary if the BNP levels are normal.

Knowing the sensitivity and specificity of tests is useful to researchers, but it is the source of much frustration to physicians because these numbers don’t describe the test from our perspective. Sensitivity and specificity tell us the likelihood of a positive or negative test, given that the patient does or does not have the disease in question. Of course, if we knew whether or not the patient had the disease, we wouldn’t need the test!

Knowing the predictive values and post-test probabilities is more helpful because these values answer the following key questions: (1) if a test is positive, what is the likelihood of disease (positive predictive value or post-test probability of a positive test)? and (2) if a test is negative, how likely is it the patient does not have the disease (negative predictive value or post-test probability of a negative test)?

What does this mean to you as a physician? First, always consider whether the information gained from the test is likely to improve patient-oriented outcomes. Second, think in terms of predictive value. How much does a positive test increase the likelihood of disease, and how much does a negative test decrease it?

Discontinuing Tests

Some tests that were once thought to be helpful turn out to be inaccurate when carefully studied (Table 1).^2–9 Positive and negative likelihood ratios (LRs) tell us the extent to which a positive or negative test increases or decreases the likelihood of disease. LRs greater than 5.0 to 10.0 significantly increase the likelihood of disease, and those less than 0.1 to 0.2 significantly decrease it. LRs between 0.2 and 5.0 change the likelihood of disease much less, especially as they approach 1.0. Although the tests listed in Table 1 are widely taught and widely used, their LRs are close to 1.0; therefore, they have little or no value for diagnosis.^2–9

Diagnosis	Test or finding	Sensitivity (%)	Specificity (%)	LR+	LR–
Acute cholecystitis²	Elevated alanine transaminase or aspartate transaminase level	38	62	1.0	1.0
Breast cancer (patient with spontaneous single-duct nipple discharge)³	Ultrasonography	36	68	1.1	0.94
Iron deficiency anemia⁴	Mean corpuscular volume of 75 to 79 μm³ (75 to 79 fL)*	—	—	1.0	—
Lumbar spinal stenosis⁵	Pain is worse with walking	71	30	1.0	1.0
Migraine headache⁶	Headache is triggered by menses	44	56	1.0	1.0
Ovarian cancer ⁷	Indigestion	36	63	1.0	1.0
Peripheral artery disease⁸	Weak femoral artery pulse	33	67	1.0	1.0
Pulmonary embolism⁹	Ventilation-perfusion scanning (intermediate probability)*	—	—	1.2	—

Some tests have no single cutoff or cut-point, such as yes or no. Instead, they can have a range of values and a range of LRs (Table 2).¹⁰ This type of LR gives us the most information from a test result.

Thickness of endometrial stripe (mm)	Likelihood ratio	Post-test probability (%)*
≤ 4	0.02	0.2
5	0.21	2.3
6 to 10	0.5	5.3
11 to 15	2.2	19.6
16 to 20	6.4	41.6
21 to 25	9.0	50.0
>25	15.2	62.8

Ruling In and Ruling Out Disease

Some tests are good at ruling in disease when the results are positive, but they do not rule out disease when they are negative (or vice versa). This can be confusing to physicians who think that tests behave symmetrically (i.e., they are equally good at ruling in and ruling out disease). Tests that are useful only for ruling in disease tend to have a sensitivity near 50 percent, but a very high specificity. Conversely, tests that are useful only for ruling out disease have a very high sensitivity, but a modest specificity. A good example comes from a meta-analysis of d-dimer testing in patients with suspected pulmonary embolism.¹¹ A rapid d-dimer test result of greater than 500 mcg per L (2.74 nmol per L) was 99 percent sensitive, but only 44 percent specific for diagnosis of pulmonary embolism. This corresponds to positive and negative LRs of 1.8 and 0.2, respectively. An online clinical calculator (http://www.dokterrutten.nl/collega/LRcalcul.html) shows that if a patient has a 10 percent pretest probability of pulmonary embolism, that probability increases to 17 percent if the d-dimer results are abnormal (not clinically helpful). However, if the d-dimer results are normal, the probability decreases to only 0.2 percent. Thus, this test is very good at ruling out pulmonary embolism when negative in a low-risk patient, but it is of little value for ruling in pulmonary embolism when results are abnormal in the same patient.

Interpreting Test Results

A common misconception is that evidence-based medicine and practice guidelines encourage a kind of “cookbook medicine,” where all patients are treated the same way. That isn’t true. A good chef knows that a cookbook provides an important starting point, but that there are usually several equally good options, depending on what ingredients are available and the desired outcomes. Similarly, the interpretation of a test and subsequent management decisions depend on the probability of disease. One example is the difference between a low-prevalence primary care or screening population and a high-prevalence referral or diseased population. For example, an abnormal CA-125 test followed by ultrasonography if the results are abnormal is 57 percent sensitive and 99 percent specific for ovarian cancer (positive LR = 57; negative LR = 0.43).¹² Therefore, this test is better at ruling in ovarian cancer when positive than at ruling it out when negative. But the prevalence of disease is critical in determining whether to use the test in practice. In the general population, in which the prevalence of ovarian cancer is only 0.04 percent,¹³ the probability that a woman with an abnormal CA-125 test plus abnormal ultrasonography has ovarian cancer is only 2.2 percent. Using this test widely for screening would result in psychological harm and overuse of invasive testing and laparoscopy.¹⁴ On the other hand, the test may be a sensible option in a high-prevalence population, such as women with a BRCA1 or BRCA2 mutation.

Combining Clinical Findings

Clinical decision rules combine findings from several elements of the history and physical examination, and sometimes a laboratory test, to help us make better diagnoses and prognoses. Well-known examples include the strep score¹⁵ and Ottawa Ankle Rules,¹⁶ but hundreds of others have been published, and many have been prospectively validated—something to look for before using them in the care of your own patients. PubMed’s Clinical Queries Web site (http://www.ncbi.nlm.nih.gov/entrez/query/static/clinical.shtml) and the Point-of-Care Guides featured in American Family Physician can be used to find clinical decision rules.

Most clinical decision rules place a patient in a risk group. This information can be used to guide further clinical decision-making. In general, when subsequent diagnostic tests are negative in a low-risk patient or positive in a high-risk patient, no further testing is necessary. Discordant results between the clinical rule and subsequent testing should prompt further evaluation. Remember, these are clinical decision-support tools, not clinical decision-replacement tools. They can improve our decision-making, but only if used wisely.

Moe GW, Howlett J, Januzzi JL, Zowall H for the Canadian Multi-center Improved Management of Patients With Congestive Heart Failure (IMPROVE-CHF) Study Investigators. N-terminal pro-B-type natriuretic peptide testing improves the management of patients with suspected acute heart failure: primary results of the Canadian prospective randomized multicenter IMPROVE-CHF study. Circulation. 2007;115(24):3103-3110.

Trowbridge RL, Rutkowski NK, Shojania KG. Does this patient have acute cholecystitis?. JAMA. 2003;289(1):80-86.

Adepoju LJ, Chun J, El-Tamer M, Ditkoff BA, Schnabel F, Joseph KA. The value of clinical characteristics and breast-imaging studies in predicting a histopathologic diagnosis of cancer or high-risk lesion in patients with spontaneous nipple discharge. Am J Surg. 2005;190(4):644-646.

Guyatt GH, Oxman AD, Ali M, Willan A, McIlroy W, Patterson C. Laboratory diagnosis of iron deficiency anemia: an overview [published correction appears in J Gen Intern Med. 1992;7(4):423]. J Gen Intern Med. 1992;7(2):145-153.

Katz JN, Dalgas M, Stucki G, et al. Degenerative lumbar spinal stenosis. Diagnostic value of the history and physical examination. Arthritis Rheum. 1995;38(9):1236-1241.

Smetana GW. The diagnostic value of historical features in primary headache syndromes: a comprehensive review. Arch Intern Med. 2000;160(18):2729-2737.

Goff BA, Mandel LS, Melancon CH, Muntz HG. Frequency of symptoms of ovarian cancer in women presenting to primary care clinics. JAMA. 2004;291(22):2705-2712.

Stoffers HE, Kester AD, Kaiser V, Rinkens PE, Knottnerus JA. Diagnostic value of signs and symptoms associated with peripheral arterial occlusive disease seen in general practice: a multivariable approach. Med Decis Making. 1997;17(1):61-70.

The PIOPED Investigators. Value of the ventilation/perfusion scan in acute pulmonary embolism. Results of the prospective investigation of pulmonary embolism diagnosis (PIOPED). JAMA. 1990;263(20):2753-2759.

Karlsson B, Granberg S, Wikland M, et al. Transvaginal ultrasonography of the endometrium in women with postmenopausal bleeding—a Nordic multicenter study. Am J Obstet Gynecol. 1995;172(5):1488-1494.

Brown MD, Rowe BH, Reeves MJ, Bermingham JM, Goldhaber SZ. The accuracy of the enzyme-linked immunosorbent assay d-dimer test in the diagnosis of pulmonary embolism: a meta-analysis. Ann Emerg Med. 2002;40(2):133-144.

Jacobs I, Davies AP, Bridges J, et al. Prevalence screening for ovarian cancer in postmenopausal women by CA 125 measurement and ultrasonography. BMJ. 1993;306(6884):1030-1034.

National Institutes of Health Consensus Development Conference Statement. Ovarian cancer: screening, treatment, and follow-up. Gynecol Oncol. 1994;55(3 pt 2):S4-S14.

Schapira MM, Matchar DB, Young MJ. The effectiveness of ovarian cancer screening. A decision analysis model. Ann Intern Med. 1993;118(11):838-843.

McIsaac WJ, Goel V, To T, Low DE. The validity of a sore throat score in family practice. CMAJ. 2000;163(7):811-815.

Stiell IG, Greenberg GH, McKnight RD, et al. Decision rules for the use of radiography in acute ankle injuries. Refinement and prospective validation. JAMA. 1993;269(9):1127-1132.

Adopting New Tests

Discontinuing Tests

Ruling In and Ruling Out Disease

Interpreting Test Results

Combining Clinical Findings

Continue Reading

More in AFP

More in PubMed