Select a Glossary:
- Terms Used in Evidence-Based Medicine
- Evidence-Based Medicine Study Types
- Abbreviations of Evidence-Based Medicine and Statistical Terms
Bias—Intentional and Unintentional
Unintentional bias is the result of using a weaker study design (e.g., a case series or observational study), not designing a study well (e.g., using too low a dose of the comparator drug), or not executing the study well (e.g., making it possible for participants or researchers to determine to which group they are assigned). Intentional bias also exists. Examples of study techniques that are designed to make a favorable result for the study drug more likely include a run-in phase using the active drug to identify compliant patients who tolerate the drug; per protocol rather than intention-to-treat analysis; and intentionally choosing too low a dose of the comparator drug or choosing an ineffective comparator drug.
Blinding and Allocation Concealment
Allocation concealment recently has been recognized as an important element of randomized controlled trial design. Allocation is concealed when neither the participants nor the researchers know or can predict to which group in a study (control or treatment) the patient is assigned. Allocation concealment takes place before the study begins, as patients are being assigned. Blinding—concealing the study group assignment from those participating in the study—occurs after the study begins. Blinding should involve the patient, the physicians caring for the patient, and the researcher. It is particularly important that the persons assessing outcomes also are blinded to the patient’s study group assignment.
Clinical Decision Rules
Individual findings from the history and physical examination often are not helpful in making a diagnosis. Usually, the physician has to consider the results of several findings as the probability of disease is revised. Clinical decision rules help make this process more objective, accurate, and consistent by identifying the best predictors of disease and combining them in a simple way to rule in or rule out a given condition. Examples include the Strep Score, the Ottawa Ankle Rules, the Wells Rule for deep venous thrombosis, and a variety of clinical rules to evaluate perioperative risk.
Clinical vs. Statistical Significance
In a large study, a small difference may be statistically significant. For example, does a 1- or 2-point difference on a 100-point dementia scale matter to your patients? It is important to ask whether statistically significant differences also are clinically significant. Conversely, if a study finds no difference, it is important to ask whether it was large enough to detect a clinically important difference and if a difference actually existed. A study with too few patients is said to lack the power to detect a difference.
Confidence Intervals and P Values
The P value tells us how likely it is that the difference between groups occurred by chance rather than because of an effect of treatment. For example, if the absolute risk reduction was 4% with P = .04, if the study were done 100 times, the risk reduction would be expected to be caused four times by chance alone. The confidence interval gives a range and is more clinically useful. A 95% confidence interval indicates that if the study were repeated 100 times, the study results would fall within this interval 95 times. For example, if a study found that a test was 80% specific with a 95% confidence interval of 74% to 85%, the specificity would fall between 74% and 85% 95 times if the study were repeated 100 times.
Disease-oriented evidence refers to the outcomes of studies that measure physiologic or surrogate markers of health. This would include things such as blood pressure, serum creatinine, glycohemoglobin, sensitivity and specificity, or peak flow. Improvements in these outcomes do not always lead to improvements in patient-oriented outcomes such as symptoms, morbidity, quality of life, or mortality.
External and Internal Validity
External validity is the extent to which results of a study can be generalized to other persons in other settings, with various conditions, especially "real world" circumstances. Internal validity is the extent to which a study measures what it is supposed to measure, and to which the results of a study can be attributed to the intervention of interest, rather than a flaw in the research design. In other words, the degree to which one can draw valid conclusions about the causal effects of one variable or another.
Were the participants analyzed in the groups to which they were assigned originally? This addresses what happens to participants in a study. Some participants might drop out because of adverse effects, have a change of therapy or receive additional therapy, move out of town, leave the study for a variety of reasons, or die. To minimize the possibility of bias in favor of either treatment, researchers should analyze participants based on their original treatment assignment regardless of what happens afterward. The intention-to-treat approach is conservative; if there is still a difference, the result is stronger and more likely to be because of the treatment. Per protocol analysis, which only analyzes the results for participants who complete the study, is more likely to be biased in favor of the active treatment.
Likelihood ratios (LRs) correspond to the clinical impression of how well a test rules in or rules out a given disease. A test with a single cutoff for abnormal will have two LRs, one for a positive test (LR+) and one for a negative test (LR–). Tests with multiple cutoffs (i.e., very low, low, normal, high, very high) can have a different LR for each range of results. A test with an LR of 1.0 indicates that it does not change the probability of disease. The higher above 1 the LR is, the better it rules in disease (an LR greater than 10 is considered good). Conversely, the lower the LR is below 1, the better the test result rules out disease (an LR less than 0.1 is considered good).
A multiple-treatments meta-analysis allows you to compare treatments directly (for example, head-to-head trials) and indirectly (for example, against a first-line treatment). This increases the number of comparisons available and may allow the development of decision tools for effective treatment prioritization.
Number Needed to Treat/Number Needed to Harm
The absolute risk reduction (ARR) can be used to calculate the number needed to treat, which is … number of patients who need to be treated to prevent one additional bad outcome. For example, if the annual mortality is 20% in the control group and 10% in the treatment group, then the ARR is 10% (20 – 10), and the number needed to treat is 100% ÷ ARR (100 ÷ 10) = 10 per year. That is, for every 10 patients who are treated for one year, one additional death is prevented. The same calculation can be made for harmful events. The number of patients who need to receive an intervention instead of the alternative for one additional patient to experience an adverse event. The NNH is calculated as: 1/ARI, where ARI is absolute risk increase (see NNT). For example, if a drug causes serious bleeding in 2% of patients in the treatment group over one year compared with 1% in the control group, the number needed to treat to harm is 100% ÷ (2% – 1%) = 100 per one year. The absolute increase (ARI) is 1%.
Observational vs. Experimental Studies
In an observational study of a drug or other treatment, the patient chooses whether or not to take the drug or to have the surgery being studied. This may introduce unintentional bias. For example, patients who choose to take hormone therapy probably are different from those who do not. Experimental studies, most commonly randomized controlled trials (RCTs), avoid this bias by randomly assigning patients to groups. The only difference between groups in a well-designed RCT is the treatment intervention, so it is more likely that differences between groups are caused by the treatment. When good observational studies disagree with good RCTs, the RCT should be trusted.
Odds Ratios and Relative Risk
Observational studies usually report their results as odds ratios or relative risks. Both are measures of the size of an association between an exposure (e.g., smoking, use of a medication) and a disease or death. A relative risk of 1.0 indicates that the exposure does not change the risk of disease. A relative risk of 1.75 indicates that patients with the exposure are 1.75 times more likely to develop the disease or have a 75% higher risk of disease. Odds ratios are a way to estimate relative risks in case-control studies, when the relative risks cannot be calculated specifically. Although it is accurate when the disease is rare, the approximation is not as good when the disease is common.
Patient-oriented evidence (POE) refers to outcomes of studies that measure things a patient would care about, such as improvement in symptoms, morbidity, quality of life, cost, length of stay, or mortality. Essentially, POE indicates whether use of the treatment or test in question helped a patient live a longer or better life. Any POE that would change practice is a POEM (patient-oriented evidence that matters).
Permuted Block Randomization
Simple randomization does not guarantee balance in numbers during a trial. If patient characteristics change with time, early imbalances cannot be corrected. Permuted block randomization ensures balance over time. The basic idea is to randomize each block such that m patients are allocated to A and m to B.
Positive and Negative Predictive Value
Predictive values help interpret the results of tests in the clinical setting. The positive predictive value (PV+) is the percentage of patients with a positive or abnormal test who have the disease in question. The negative predictive value (PV–) is the percentage of patients with a negative or normal test who do not have the disease in question. Although the sensitivity and specificity of a test do not change as the overall likelihood of disease changes in a population, the predictive value does change. For example, the PV+ increases as the overall probability of disease increases, so a test that has a PV+ of 30% when disease is rare may have a PV+ of 90% when it is common. Similarly, the PV changes with a physician’s clinical suspicion that a disease is or is not present in a given patient.
Pretest and Post-test Probability
Whenever an illness is suspected, physicians should begin with an estimate of how likely it is that the patient has the disease. This estimate is the pretest probability. After the patient has been interviewed and examined, the results of the clinical examination are used to revise this probability upward or downward to determine the post-test probability. Although usually implicit, this process can be made more explicit using results from epidemiologic studies, knowledge of the accuracy of tests, and Bayes’ theorem. The post-test probability from the clinical examination then becomes the starting point when ordering diagnostic tests or imaging studies and becomes a new pretest probability. After the results are reviewed, the probability of disease is revised again to determine the final post-test probability of disease.
Relative and Absolute Risk Reduction
Studies often use relative risk reduction to describe results. For example, if mortality is 20% in the control group and 10% in the treatment group, there is a 50% relative risk reduction ([20 – 10] ÷ 20) x 100%. However, if mortality is 2% in the control group and 1% in the treatment group, this also indicates a 50% relative risk reduction, although it is a different clinical scenario. Absolute risk reduction subtracts the event rates in the control and treatment groups. In the first example, the absolute risk reduction is 10%, and in the second example it is 1%. Reporting absolute risk reduction is a less dramatic but more clinically meaningful way to convey results.
A run-in period is a brief period at the beginning of a trial before the intervention is applied. In some cases, run-in periods are appropriate (for example, to wean patients from a previously prescribed medication). However, run-in periods to assess compliance and ensure treatment responsiveness create a bias in favor of the treatment and reduce generalizability.
The number of patients in a study, called the sample size, determines how precisely a research question can be answered. There are two potential problems related to sample size. A large study can give a precise estimate of effect and find small differences between groups that are statistically significant, but that may not be clinically meaningful. On the other hand, a small study might not find a difference between groups (even though such a difference may actually exist and may be clinically meaningful) because it lacks statistical power. The “power” of a study takes various factors into consideration, such as sample size, to estimate the likelihood that the study will detect true differences between two groups.
Sensitivity and Specificity
Sensitivity is the percentage of patients with a disease who have a positive test for the disease in question. Specificity is the percentage of patients without the disease who have a negative test. Because it is unknown if the patient has the disease when the tests are ordered, sensitivity and specificity are of limited value. They are most valuable when very high (greater than 95%). A highly Sensitive test that is Negative tends to rule Out the disease (SnNOut), and a highly Specific test that is Positive tends to rule In the disease (SpPIn).
Systematic Reviews and Meta-Analyses
Often, there are many studies of varying quality and size that address a clinical question. Systematic reviews can help evaluate the studies by posing a focused clinical question, identifying every relevant study in the literature, evaluating the quality of these studies by using predetermined criteria, and answering the question based on the best available evidence. Meta-analyses combine data from different studies; this should be done only if the studies were of good quality and were reasonably homogeneous (i.e., most had generally similar characteristics).
Type of Study: Treatment
Studies of treatments, whether the treatment is a drug, device, or other intervention, must be randomized controlled trials. Because most new, relevant medical information involves advances in treatment, these studies must sustain rigorous review.
- Was it a controlled trial and were the patients randomly assigned? Studies not meeting both criteria are not reviewed.
- Are the patients in the study so dissimilar to typical primary care patients that the results will not apply? Studies performed on patients enrolled in settings markedly different from primary care will not be reviewed.
- Were steps taken to conceal the treatment assignment from personnel entering patients into the study? “Concealed allocation” through the use of opaque envelopes, centralized randomization, or other methods prevents selective enrollment of patients into a study. It is not the same as blinding, which occurs after the study begins. The primary concern is about who will be enrolling patients. While the investigators are enrolling patients before the trial starts, they should make sure patients do not know to which group they will be allocated. This knowledge might introduce bias and affect how patients are enrolled. Concealed allocation generally will be noted in POEMs reviews but not in Evidence-Based Practice. If the allocation concealment is unclear, the study will be included unless there is a good chance that unconcealed allocation could produce a systematic bias (e.g., when popular opinion favors one treatment over another or when a skewed distribution of disease severity may affect the study outcome).
- Were all patients who entered the trial properly accounted for at its conclusion? Follow-up of patients entering the trial will be assessed. Studies with incomplete follow-up or large dropout rates (more than 20 percent) will not be reviewed.
Type of Study: Diagnosis
Studies of diagnostic tests, whether in a laboratory or as part of the physical examination, must demonstrate that the test is accurate at identifying the disease when it is present, that the test does not identify the disease when it is not present, and that it works well over a wide spectrum of patients with and without the disease.
- What is the disease being addressed? Studies evaluating a diagnostic test that identify an abnormality but not a disease generally are not reviewed.
- Is the test compared with an acceptable “gold standard”? The characteristics of the new test should be compared with the best available method for identifying the disease.
- Were both tests applied in a uniformly blind manner? This question determines that every patient received both tests, and that one test was not performed with knowledge of the results of the other test, which could introduce bias.
- Is the new test reasonable? Studies that evaluate diagnostic tests that cannot be implemented readily by primary care physicians will not be reviewed.
- What is the prevalence of disease in the study population? The prevalence of disease in the study population will be reported so that readers can compare it with their own practice.
- What are the test characteristics? The sensitivity, specificity, predictive values, and likelihood ratios will be reported. These values will be calculated from data in the study if they are not reported by the authors.
Type of Study: Systematic Reviews
Only systematic reviews (overviews), including meta-analyses, will be considered.
- Were the methods used to locate relevant studies comprehensive and clearly stated? Reviews not stating the method of locating studies will not be reviewed.
- Were explicit methods used to select studies to include in the overview? Reviews not stating methods of including or excluding studies will not be reviewed.
- Was the validity of the original studies included in the overview appropriately assessed? Reviews not stating the method used to assess the validity of the original studies will not be reviewed. Reviews can include or exclude studies based on quality scores. Reviews including all studies irrespective of their quality scores should present the validity evaluation; reviews eliminating studies based on low quality should describe explicitly how these studies were eliminated.
- Was the assessment of the relevance and validity of the original studies reproducible and free from bias? Published methods of assessing relevance or validity of others can be referenced or new criteria can be described. Generally, validity assessment should be performed independently by at least two investigators.
- Was variation between the results of the relevant studies analyzed? Heterogeneity in study results should be evaluated and, if present, explained.
- Were the results combined appropriately? When results from different studies are combined, only similar outcomes should be combined. Reviews that attempt to convert study results from one scale to another generally will not be considered.
Type of Study: Prognosis
The main threats to studies of prognosis are initial patient identification and loss of follow-up. Only prognosis studies that identify patients before they have the outcome of importance and follow up with at least 80 percent of patients are included.
- Was an “inception cohort” assembled? Did the investigators identify a specific group and follow it forward in time? Studies that do not meet these criteria or assemble an “inception cohort” or follow a specific group forward are not reviewed.
- Were the criteria for entry into the study objective and reasonable? Entry criteria must be reproducible and not too restrictive or too broad.
- Was group follow-up adequate (at least 80 percent)?
- Were the patients similar to those in primary care in terms of age, sex, race, severity of disease, and other factors that might influence the course of the disease?
- Where did the patients come from—was the referral pattern specified? The source of patients will be noted in the review.
- Were outcomes assessed objectively and blindly?
Decision analysis involves choosing an action after formally and logically weighing the risks and benefits of the alternatives. Although all clinical decisions are made under conditions of uncertainty, this uncertainty decreases when the medical literature includes directly relevant, valid evidence. When the published evidence is scant, or less valid, uncertainty increases. Decision analysis allows physicians to compare the expected consequences of pursuing different strategies under conditions of uncertainty. In a sense, decision analysis is an attempt to construct POEMs artificially out of disease-oriented evidence.
- Were all important strategies and outcomes included? Analyses evaluating only some outcomes or strategies will not be reviewed.
- Was an explicit and sensible process used to identify, select, and combine the evidence into probabilities? Is the evidence strong enough?
- Were the utilities obtained in an explicit and sensible way from credible sources? Specifically, were utilities obtained from small samples or from groups not afflicted with the disease or outcome.
- Was the potential impact of any uncertainty in the evidence determined? It must be noted whether a sensitivity analysis was performed to determine how robust the analysis is under different conditions.
- How strong is the evidence used in the analysis? Could the uncertainty in the evidence change the result? It will be noted if any given variable unduly influences the analysis.
Qualitative research uses nonquantitative methods to answer questions. While this type of research is able to investigate questions that quantitative research cannot, it is at risk for bias and error on the part of the researcher. Qualitative research findings will be reported if they are highly relevant, although specific conclusions will not be drawn from the results.
- Was the appropriate method used to answer the question? Interviews or focus groups should be used to study perceptions. Observation is required to evaluate behaviors. Studies not using the appropriate method will not be reviewed.
- Was appropriate and adequate sampling used to get the best information? Random sampling is not used in qualitative research. Instead, patients are selected with the idea that they are best suited to provide appropriate information. Assurance that enough patients were studied to provide sufficient information should be found in the description.
- Was an iterative process of collecting information used? In qualitative research, the researcher learns about the topic as the research progresses. The study design should consist of data collection and analysis, followed by more data collection and analysis, in an iterative fashion, until no more information is obtained.
- Was a thorough analysis presented? A good qualitative study presents the findings and provides a thorough analysis of the data.
- Are the background and training of the investigators described? Because investigators are being relied on for analysis of the data, their training and biases must be documented. These characteristics can be used to evaluate the conclusions.
|Term: Sensitivity||Abbreviation: Sn||Definition: Percentage of patients with disease who have a positive test for the disease in question|
|Term: Specificity||Abbreviation: Sp||Definition: Percentage of patients without disease who have a negative test for the disease in question|
|Term: Predictive value (positive and negative)||Abbreviation: PV+|
|Definition: Percentage of patients with a positive or negative test for a disease who do or do not have the disease in question|
|Abbreviation:||Definition: Probability of disease before a test is performed|
|Abbreviation:||Definition: Probability of disease after a test is performed|
|Term: Likelihood ratio||Abbreviation: LR||Definition: LR >1 indicates an increased likelihood of disease|
LR <1 indicates a decreased likelihood of disease.
The most helpful tests generally have a ratio of less than 0.2 or greater than 5.
|Term: Relative risk reduction||Abbreviation: RRR||Definition: The percentage difference in risk or outcomes between treatment and control groups. Example: if mortality is 30% in controls and 20% with treatment, RRR is (30-20)/30 = 33 percent.|
|Term: Absolute risk reduction||Abbreviation: ARR||Definition: The arithmetic difference in risk or outcomes between treatment and control groups. Example: if mortality is 30% in controls and 20% with treatment, ARR is 30-20=10%.|
|Term: Number needed to treat||Abbreviation: NNT||Definition: The number of patients who need to receive an intervention instead of the alternative in order for one additional patient to benefit. The NNT is calculated as: 1/ARR. Example: if the ARR is 4%, the NNT = 1/4% = 1/0.04 = 25.|
|Term: 95 percent confidence interval||Abbreviation: 95% CI||Definition: An estimate of certainty. It is 95% certain that the true value lies within the given range. A narrow CI is good. A CI that spans 1.0 calls into question the validity of the result.|
|Term: Systematic review||Abbreviation:||Definition: A type of review article that uses explicit methods to comprehensively analyze and qualitatively synthesize information from multiple studies|
|Term: Meta-analysis||Abbreviation:||Definition: A type of systematic review that uses rigorous statistical methods to quantitatively synthesize the results of multiple similar studies|