- Research Process
Levels of evidence in research
- 5 minute read
- 31.5K views
Table of Contents
Level of evidence hierarchy
When carrying out a project you might have noticed that while searching for information, there seems to be different levels of credibility given to different types of scientific results. For example, it is not the same to use a systematic review or an expert opinion as a basis for an argument. It’s almost common sense that the first will demonstrate more accurate results than the latter, which ultimately derives from a personal opinion.
In the medical and health care area, for example, it is very important that professionals not only have access to information but also have instruments to determine which evidence is stronger and more trustworthy, building up the confidence to diagnose and treat their patients.
5 levels of evidence
With the increasing need from physicians – as well as scientists of different fields of study-, to know from which kind of research they can expect the best clinical evidence, experts decided to rank this evidence to help them identify the best sources of information to answer their questions. The criteria for ranking evidence is based on the design, methodology, validity and applicability of the different types of studies. The outcome is called “levels of evidence” or “levels of evidence hierarchy”. By organizing a well-defined hierarchy of evidence, academia experts were aiming to help scientists feel confident in using findings from high-ranked evidence in their own work or practice. For Physicians, whose daily activity depends on available clinical evidence to support decision-making, this really helps them to know which evidence to trust the most.
So, by now you know that research can be graded according to the evidential strength determined by different study designs. But how many grades are there? Which evidence should be high-ranked and low-ranked?
There are five levels of evidence in the hierarchy of evidence – being 1 (or in some cases A) for strong and high-quality evidence and 5 (or E) for evidence with effectiveness not established, as you can see in the pyramidal scheme below:
Level 1: (higher quality of evidence) – High-quality randomized trial or prospective study; testing of previously developed diagnostic criteria on consecutive patients; sensible costs and alternatives; values obtained from many studies with multiway sensitivity analyses; systematic review of Level I RCTs and Level I studies.
Level 2: Lesser quality RCT; prospective comparative study; retrospective study; untreated controls from an RCT; lesser quality prospective study; development of diagnostic criteria on consecutive patients; sensible costs and alternatives; values obtained from limited stud- ies; with multiway sensitivity analyses; systematic review of Level II studies or Level I studies with inconsistent results.
Level 3: Case-control study (therapeutic and prognostic studies); retrospective comparative study; study of nonconsecutive patients without consistently applied reference “gold” standard; analyses based on limited alternatives and costs and poor estimates; systematic review of Level III studies.
Level 4: Case series; case-control study (diagnostic studies); poor reference standard; analyses with no sensitivity analyses.
Level 5: (lower quality of evidence) – Expert opinion.
By looking at the pyramid, you can roughly distinguish what type of research gives you the highest quality of evidence and which gives you the lowest. Basically, level 1 and level 2 are filtered information – that means an author has gathered evidence from well-designed studies, with credible results, and has produced findings and conclusions appraised by renowned experts, who consider them valid and strong enough to serve researchers and scientists. Levels 3, 4 and 5 include evidence coming from unfiltered information. Because this evidence hasn’t been appraised by experts, it might be questionable, but not necessarily false or wrong.
Examples of levels of evidence
As you move up the pyramid, you will surely find higher-quality evidence. However, you will notice there is also less research available. So, if there are no resources for you available at the top, you may have to start moving down in order to find the answers you are looking for.
- Systematic Reviews: -Exhaustive summaries of all the existent literature about a certain topic. When drafting a systematic review, authors are expected to deliver a critical assessment and evaluation of all this literature rather than a simple list. Researchers that produce systematic reviews have their own criteria to locate, assemble and evaluate a body of literature.
- Meta-Analysis: Uses quantitative methods to synthesize a combination of results from independent studies. Normally, they function as an overview of clinical trials. Read more: Systematic review vs meta-analysis .
- Critically Appraised Topic: Evaluation of several research studies.
- Critically Appraised Article: Evaluation of individual research studies.
- Randomized Controlled Trial: a clinical trial in which participants or subjects (people that agree to participate in the trial) are randomly divided into groups. Placebo (control) is given to one of the groups whereas the other is treated with medication. This kind of research is key to learning about a treatment’s effectiveness.
- Cohort studies: A longitudinal study design, in which one or more samples called cohorts (individuals sharing a defining characteristic, like a disease) are exposed to an event and monitored prospectively and evaluated in predefined time intervals. They are commonly used to correlate diseases with risk factors and health outcomes.
- Case-Control Study: Selects patients with an outcome of interest (cases) and looks for an exposure factor of interest.
- Background Information/Expert Opinion: Information you can find in encyclopedias, textbooks and handbooks. This kind of evidence just serves as a good foundation for further research – or clinical practice – for it is usually too generalized.
Of course, it is recommended to use level A and/or 1 evidence for more accurate results but that doesn’t mean that all other study designs are unhelpful or useless. It all depends on your research question. Focusing once more on the healthcare and medical field, see how different study designs fit into particular questions, that are not necessarily located at the tip of the pyramid:
- Questions concerning therapy: “Which is the most efficient treatment for my patient?” >> RCT | Cohort studies | Case-Control | Case Studies
- Questions concerning diagnosis: “Which diagnose method should I use?” >> Prospective blind comparison
- Questions concerning prognosis: “How will the patient’s disease will develop over time?” >> Cohort Studies | Case Studies
- Questions concerning etiology: “What are the causes for this disease?” >> RCT | Cohort Studies | Case Studies
- Questions concerning costs: “What is the most cost-effective but safe option for my patient?” >> Economic evaluation
- Questions concerning meaning/quality of life: “What’s the quality of life of my patient going to be like?” >> Qualitative study
Find more about Levels of evidence in research on Pinterest:
- Publication Process
Salami slicing research
You may also like.
Choosing the Right Research Methodology: A Guide for Researchers
Navigating the Reproducibility Crisis: A Guide to Analytical Method Validation
Why is data validation important in research?
Writing a good review article
Scholarly Sources: What are They and Where can You Find Them?
Research Designs: Types and Differences
The Top 5 Qualities of Every Good Researcher
What do reviewers look for in a grant proposal?
Input your search keywords and press Enter.
Purdue Online Writing Lab College of Liberal Arts
Using Research and Evidence
Welcome to the Purdue OWL
This page is brought to you by the OWL at Purdue University. When printing this page, you must include the entire legal notice.
Copyright ©1995-2018 by The Writing Lab & The OWL at Purdue and Purdue University. All rights reserved. This material may not be published, reproduced, broadcast, rewritten, or redistributed without permission. Use of this site constitutes acceptance of our terms and conditions of fair use.
These OWL resources will help you develop and refine the arguments in your writing.
What type of evidence should I use?
There are two types of evidence.
First hand research is research you have conducted yourself such as interviews, experiments, surveys, or personal experience and anecdotes.
Second hand research is research you are getting from various texts that has been supplied and compiled by others such as books, periodicals, and Web sites.
Regardless of what type of sources you use, they must be credible. In other words, your sources must be reliable, accurate, and trustworthy.
How do I know if a source is credible?
You can ask the following questions to determine if a source is credible.
Who is the author? Credible sources are written by authors respected in their fields of study. Responsible, credible authors will cite their sources so that you can check the accuracy of and support for what they've written. (This is also a good way to find more sources for your own research.)
How recent is the source? The choice to seek recent sources depends on your topic. While sources on the American Civil War may be decades old and still contain accurate information, sources on information technologies, or other areas that are experiencing rapid changes, need to be much more current.
What is the author's purpose? When deciding which sources to use, you should take the purpose or point of view of the author into consideration. Is the author presenting a neutral, objective view of a topic? Or is the author advocating one specific view of a topic? Who is funding the research or writing of this source? A source written from a particular point of view may be credible; however, you need to be careful that your sources don't limit your coverage of a topic to one side of a debate.
What type of sources does your audience value? If you are writing for a professional or academic audience, they may value peer-reviewed journals as the most credible sources of information. If you are writing for a group of residents in your hometown, they might be more comfortable with mainstream sources, such as Time or Newsweek . A younger audience may be more accepting of information found on the Internet than an older audience might be.
Be especially careful when evaluating Internet sources! Never use Web sites where an author cannot be determined, unless the site is associated with a reputable institution such as a respected university, a credible media outlet, government program or department, or well-known non-governmental organizations. Beware of using sites like Wikipedia , which are collaboratively developed by users. Because anyone can add or change content, the validity of information on such sites may not meet the standards for academic research.
- Library databases
- Library website
- Walden University
- Academic Guides
- Evidence-Based Research
- Levels of Evidence Pyramid
Evidence-Based Research: Levels of Evidence Pyramid
- Phrasing Research Questions
Levels of evidence pyramid, filtered resources, systematic reviews, critically-appraised topics, critically-appraised individual articles, unfiltered resources, trip database, background information & expert opinion.
- Evidence Types
- CINAHL Search Help
- MEDLINE Search Help
- Joanna Briggs Institute Search Help
One way to organize the different types of evidence involved in evidence-based practice research is the levels of evidence pyramid. The pyramid includes a variety of evidence types and levels.
- systematic reviews
- critically-appraised topics
- critically-appraised individual articles
- randomized controlled trials
- cohort studies
- case-controlled studies, case series, and case reports
- Background information, expert opinion
The levels of evidence pyramid provides a way to visualize both the quality of evidence and the amount of evidence available. For example, systematic reviews are at the top of the pyramid, meaning they are both the highest level of evidence and the least common. As you go down the pyramid, the amount of evidence will increase as the quality of the evidence decreases.
Text alternative for Levels of Evidence Pyramid diagram
EBM Pyramid and EBM Page Generator, copyright 2006 Trustees of Dartmouth College and Yale University. All Rights Reserved. Produced by Jan Glover, David Izzo, Karen Odato and Lei Wang.
Filtered resources appraise the quality of studies and often make recommendations for practice. The main types of filtered resources in evidence-based practice are:
Scroll down the page to the Systematic reviews , Critically-appraised topics , and Critically-appraised individual articles sections for links to resources where you can find each of these types of filtered information.
Authors of a systematic review ask a specific clinical question, perform a comprehensive literature review, eliminate the poorly done studies, and attempt to make practice recommendations based on the well-done studies. Systematic reviews include only experimental, or quantitative, studies, and often include only randomized controlled trials.
You can find systematic reviews in these filtered databases :
- Cochrane Database of Systematic Reviews Cochrane systematic reviews are considered the gold standard for systematic reviews. This database contains both systematic reviews and review protocols. To find only systematic reviews, select Cochrane Reviews in the Document Type box.
- JBI EBP Database (formerly Joanna Briggs Institute EBP Database) This database includes systematic reviews, evidence summaries, and best practice information sheets. To find only systematic reviews, click on Limits and then select Systematic Reviews in the Publication Types box. To see how to use the limit and find full text, please see our Joanna Briggs Institute Search Help page .
You can also find systematic reviews in this unfiltered database :
To learn more about finding systematic reviews, please see our guide:
- Filtered Resources: Systematic Reviews
Authors of critically-appraised topics evaluate and synthesize multiple research studies. Critically-appraised topics are like short systematic reviews focused on a particular topic.
You can find critically-appraised topics in these resources:
- Annual Reviews This collection offers comprehensive, timely collections of critical reviews written by leading scientists. To find reviews on your topic, use the search box in the upper-right corner.
- Guideline Central This free database offers quick-reference guideline summaries organized by a new non-profit initiative which will aim to fill the gap left by the sudden closure of AHRQ’s National Guideline Clearinghouse (NGC).
- JBI EBP Database (formerly Joanna Briggs Institute EBP Database) To find critically-appraised topics in JBI, click on Limits and then select Evidence Summaries from the Publication Types box. To see how to use the limit and find full text, please see our Joanna Briggs Institute Search Help page .
- Best BETs Best Evidence Topics are modified critically-appraised topics designed specifically for emergency medicine.
- National Institute for Health and Care Excellence (NICE) Evidence-based recommendations for health and care in England.
- Filtered Resources: Critically-Appraised Topics
Authors of critically-appraised individual articles evaluate and synopsize individual research studies.
You can find critically-appraised individual articles in these resources:
- EvidenceAlerts Quality articles from over 120 clinical journals are selected by research staff and then rated for clinical relevance and interest by an international group of physicians. Note: You must create a free account to search EvidenceAlerts.
- ACP Journal Club This journal publishes reviews of research on the care of adults and adolescents. You can either browse this journal or use the Search within this publication feature.
- Evidence-Based Nursing This journal reviews research studies that are relevant to best nursing practice. You can either browse individual issues or use the search box in the upper-right corner.
To learn more about finding critically-appraised individual articles, please see our guide:
- Filtered Resources: Critically-Appraised Individual Articles
You may not always be able to find information on your topic in the filtered literature. When this happens, you'll need to search the primary or unfiltered literature. Keep in mind that with unfiltered resources, you take on the role of reviewing what you find to make sure it is valid and reliable.
Note: You can also find systematic reviews and other filtered resources in these unfiltered databases.
The Levels of Evidence Pyramid includes unfiltered study types in this order of evidence from higher to lower:
You can search for each of these types of evidence in the following databases:
Background information and expert opinions are not necessarily backed by research studies. They include point-of-care resources, textbooks, conference proceedings, etc.
- Family Physicians Inquiries Network: Clinical Inquiries Provide the ideal answers to clinical questions using a structured search, critical appraisal, authoritative recommendations, clinical perspective, and rigorous peer review. Clinical Inquiries deliver best evidence for point-of-care use.
- Harrison, T. R., & Fauci, A. S. (2009). Harrison's Manual of Medicine . New York: McGraw-Hill Professional. Contains the clinical portions of Harrison's Principles of Internal Medicine .
- Lippincott manual of nursing practice (8th ed.). (2006). Philadelphia, PA: Lippincott Williams & Wilkins. Provides background information on clinical nursing practice.
- Medscape: Drugs & Diseases An open-access, point-of-care medical reference that includes clinical information from top physicians and pharmacists in the United States and worldwide.
- Virginia Henderson Global Nursing e-Repository An open-access repository that contains works by nurses and is sponsored by Sigma Theta Tau International, the Honor Society of Nursing. Note: This resource contains both expert opinion and evidence-based practice articles.
- Previous Page: Phrasing Research Questions
- Next Page: Evidence Types
- Student Wellness and Disability Services
- Academic Residencies
- Academic Skills
- Career Planning and Development
- Customer Care Team
- Field Experience
- Military Services
- Student Success Advising
- Writing Skills
Centers and Offices
- Center for Social Change
- Office of Degree Acceleration
- Office of Student Affairs
- Office of Research and Doctoral Services
- CAEX Courses and Workshops
- Doctoral Writing Assessment
- Form & Style Review
- Quick Answers
- Walden Bookstore
- Walden Catalog & Student Handbook
- Student Safety/Title IX
- Legal & Consumer Information
- Website Terms and Conditions
- State Authorization
- Net Price Calculator
- Contact Walden
Walden University is a member of Adtalem Global Education, Inc. www.adtalem.com Walden University is certified to operate by SCHEV © 2023 Walden University LLC. All rights reserved.
Determining the level of evidence: Experimental research appraisal : Nursing2020 Critical Care
- Subscribe to journal Subscribe
- Get new issue alerts Get alerts
Colleague's E-mail is Invalid
Your message has been successfully sent to your colleague.
Save my selection
Determining the level of evidence
Experimental research appraisal.
Glasofer, Amy DNP, RN, NE-BC; Townsend, Ann B. DrNP, RN, ANP-C, CNS-C
Amy Glasofer is a nurse scientist at Virtua Center for Learning, Mt. Laurel, N.J., and a member of the Nursing2019 Critical Care Editorial Board.
Ann B. Townsend is an adult NP with The Nurse Practitioner Group, LLC.
The authors have disclosed no financial relationships related to this article.
The first installment in this series provides a basic understanding of research design to appraise the level of evidence of a source. This article reviews appraisal of randomized controlled trials and quasi-experimental research.
Critical care nurses have a responsibility to use evidence-based practices in their patient care. To ensure their actions will produce the desired outcomes, critical care nurses must use the strongest evidence available to support patient care. 1 Determining what qualifies as “strong” evidence can be challenging.
According to the Agency for Healthcare Research and Quality, the evidential strength includes three elements: quality, quantity, and consistency. 2 Quality is the most challenging element nurses must evaluate when assessing the strength of evidence for a topic. Quality refers to the methods used to ensure that results are valid and not influenced by bias or occurring by chance. 2 One component of quality is the level of the evidence. Quantity is evaluated by considering the number of studies on a topic, the size of the studies, and the impact of studied treatments. Consistency is the easiest of these elements to understand; for evidence to be strong, similar findings should be reported across multiple sources. 2
This series will provide basic guidance for appraising evidence. However, this is only one step in the evidence-based practice (EBP) process, which includes complexities that this series will not address. Many resources exist for nurses to develop their critical appraisal skills and strengthen their understanding of the EBP process. For example, the American Journal of Nursing published a 12-article series outlining a step-by-step approach to EBP. 3
A variety of evidence hierarchies exist to evaluate the level of evidence. 1 To apply these hierarchies, nurses must have a working knowledge of research design. This initial Evaluating the Evidence Series installment will provide nurses with a basic understanding of research design to appraise the level of evidence of a source. This article will review appraisal of experimental research, which includes randomized controlled trials (RCTs) (Level 1) and quasi-experimental research (Level 2). Future installments in this series will address nonexperimental research appraisal (Level 3) and finally the leveling of nonresearch evidence (Levels 4 and 5).
The evidence pyramid
One way to understand evidence hierarchies is to consider crime scene evidence. Different types of crime scene evidence are weighed differently when trying to prove an individual's guilt or innocence. For example, DNA evidence is superior to eyewitness testimony because witnesses are susceptible to bias and DNA is more objective. 4 A determination of guilt is more likely if DNA evidence is present or if there are multiple eyewitnesses with consistent reports than if only one eyewitness testimony is presented. DNA might be on the top level of a criminal evidence hierarchy, and eyewitness testimony could be found lower down. 4
The same is true of clinical evidence, but rather than determining guilt or innocence nurses must determine if cause and effect exists. To objectively arrive at a conclusion, nurses must use the strongest evidence available. Imagine the evidence levels arranged by research design. (See Evidence hierarchy .) The top of the pyramid, Level 1, represents the strongest evidence. As researchers move through the pyramid from Level 1 down, the study designs become less rigorous, which may influence the results through the introduction of bias or conclusion errors. Pyramids vary between organizations and disciplines, but they all follow these basic principles. Some additional level of evidence hierarchies include the Joanna Briggs Institute levels of evidence, or the Oxford Center for Evidence Based Medicine. 5,6 This article will use the Johns Hopkins hierarchy of evidence. 7
Level 1: RCTs, systematic reviews, and meta-analyses
According to the Johns Hopkins hierarchy of evidence, the highest level of evidence is an RCT, a systematic review of RCTs, or a meta-analysis of RCTs. 7 In an RCT, the study must meet three criteria: random or “by chance” assignment of participants into two or more groups, an intervention or treatment applied to at least one of the groups, and a control group that does not receive the same treatment or intervention. The methodologies used in Level 1 evidence reduce bias and help identify cause-and-effect relationships. 8
Consider the following example research question. What is the effect of caffeine on nursing medication errors? To answer this question using an RCT, first recruit a sample of nurses. The study must have institutional review board approval and informed consent from the participants, and the study should follow the EQUATOR guidelines. 9 Each participating nurse is assigned by chance (like the flip of a coin) to the caffeine (intervention) group, or the no-caffeine (control) group. Ensure that the two groups are the same regarding any other factor that might impact medication errors aside from the intervention (patient acuity, nurse experience), or take these other factors into account in the data analysis and conclusion. In doing so, researchers can conclude that any statistically significant differences in medication errors between the groups are a result of the caffeine and not chance.
Although one DNA sample provides strong evidence, multiple DNA samples confirming the same suspect are even stronger. Systematic reviews and meta-analyses of RCTs follow this reasoning. Both evaluate multiple research studies. When all the studies included are RCTs, the findings are more powerful than any one RCT on its own. A systematic review uses a rigorous process to identify, appraise, and synthesize the evidence on a particular topic. 1 A meta-analysis takes it one step further and conducts a statistical analysis of the synthesized data to obtain a statistic representing the effect of the intervention across multiple studies. 1 So, a systematic review on the effect of caffeine and medication errors would include a rigorous review of every RCT on the topic that met specific inclusion criteria, and a meta-analysis would provide a summary statistic on the size of the effect or the influence of caffeine on medication errors.
Just as DNA evidence can be flawed, RCTs, systematic reviews, and meta-analyses can have limitations. In the example, researchers are seeking volunteers to participate. The voluntary participants could be very different than the nurses who choose not to participate. If so, study findings might not apply to nurses in general. Nurses in both groups might improve practice because they know they are being observed, resulting in decreased medication errors across both groups. The nurses assigned to the control group may perform poorly because they are in withdrawal from their typical caffeine intake. Or, the nurses in the control group could be unhappy that they were assigned to the noncaffeine group and behave differently. There are strategies to eliminate some sources of bias. For example, researchers could “blind” or “mask” the participants to which group they were randomly assigned so they are unaware of caffeine consumption. To achieve this, researchers would not tell the nurses which group they are in and give both groups coffee (caffeinated to the intervention group and decaffeinated to the control group). However, even in a well-designed RCT, the reader must be critical of the findings. The same is true of systematic reviews and meta-analyses, as they are only as strong as the thoroughness of the review and the findings of the weakest study included in the analysis.
Level 2: Quasi-experimental research
Fingerprints remain an important source of crime scene evidence, although they are not as reliable as DNA. 10 Fingerprint comparisons require expert review. Expert judgment introduces greater bias and uncertainty than DNA evidence. 10 So, fingerprints might be considered one level below DNA in the crime scene evidence hierarchy.
In the Johns Hopkins hierarchy, Level 2 contains quasi-experimental research studies as well as systematic reviews of both RCTs and quasi-experimental studies with or without meta-analysis. 7 This group is still experimental because it involves manipulation or an intervention introduced by the research. However, it is termed quasi-experimental because it lacks one or two of the three criteria required for a true experimental design. Examples of quasi-experimental designs used in nursing research are the nonequivalent control group design, the pre-posttest design, and the interrupted time series design. 7
Consider the sample research question. Instead of randomly assigning nurses to the caffeine or noncaffeine groups, researchers could compare two units in a nonequivalent control group design. One could be the caffeine unit, and the other could be the noncaffeine unit. Or researchers could give one group of nurses no caffeine for a time, and then give them caffeine during another period as in an interrupted time series design. Researchers would observe medication errors throughout, comparing one study period to the other. Further still, researchers could only have one group receive caffeine and make no comparison. In these examples, assignment is no longer random. There could be alternative explanations for the difference in medication error rates seen between the groups. When comparing two different units, patient or nursing populations may be dissimilar, fewer medications may be given on one unit than another, processes for medication administration may differ, or any of a multitude of other factors may impact the study outcomes. Similarly, when researchers compare the same group at two different time periods, an unrelated change in practice, patient population, or acuity could explain results. And when there is no comparison group, researchers have no basis for determining if medication errors are associated with caffeine consumption.
No matter how well executed a quasi-experimental study is, nurses must be less certain of its results compared with an RCT. The same is true of systematic reviews with or without meta-analysis that include quasi-experimental studies. A review is only as strong as the weakest study included. Therefore, reviews that include quasi-experimental studies are not as strong as those that include only RCTs. The quasi-experimental design will always fall lower than an RCT in an evidence hierarchy, regardless of the model consulted. Despite this, researchers will continue to use quasi-experimental designs. Quasi-experimental research can be simpler to carry out in practice, and often feasibility trumps rigor.
Critical care nurses endeavoring to provide evidence-based care may find themselves acting as detectives. Although it may be tempting to reach a conclusion when a piece of evidence that matches one's suspicions is identified, the investigation must go deeper. Nurses are required to find a sufficient number of sources that arrive at similar conclusions. Although no magic number indicates sufficient evidence, fewer sources are needed when synthesizing higher-quality evidence.
One element of quality is the level of evidence. The level of evidence is based on how the design minimizes the impact of bias and chance of the conclusions drawn. Many hierarchies exist to weigh different levels of evidence against one another. Regardless of the evidence hierarchy used, RCTs and systematic reviews with or without meta-analysis exist at or near the highest level of evidence, with quasi-experimental research following closely behind. Nurses must use their critical appraisal skills to determine when a study has employed an experimental design, is using a control group, or has assigned participants to groups randomly to support the quest to provide evidence-based patient care. Upcoming installments of this series will discuss levels 3, 4, and 5, which include nonexperimental research, and sources of nonresearch evidence.
- + Favorites
- View in Gallery
Developing NICE guidelines: the manual
Process and methods [PMG20] Published: 31 October 2014 Last updated: 18 January 2022
- Tools and resources
- 1 Introduction
- 2 The scope
- 3 Decision-making committees
- 4 Developing review questions and planning the evidence review
- 5 Identifying the evidence: literature searching and evidence submission
6 Reviewing research evidence
- 7 Incorporating economic evaluation
- 8 Linking to other guidance
- 9 Writing the guideline
- 10 The validation process for draft guidelines, and dealing with stakeholder comments
- 11 Finalising and publishing the guideline
- 12 Resources to support putting the guideline into practice
- 13 Ensuring that published guidelines are current and accurate
- 14 Updating guideline recommendations
- 15 Appendices
- Update information
Process and methods
6.1 identifying and selecting relevant evidence, 6.2 assessing quality of evidence: critical appraisal, analysis, and certainty in the findings, 6.3 equality and diversity considerations, 6.4 health inequalities, 6.5 summarising evidence, 6.6 references and further reading.
Reviewing evidence is an explicit, systematic and transparent process that can be applied to both quantitative (experimental and observational) and qualitative evidence (see the chapter on developing review questions and planning the evidence review ). The key aim of any review is to provide a summary of the relevant evidence to ensure that the committee can make fully informed decisions about its recommendations . This chapter describes how evidence is reviewed in the development of guidelines.
Evidence reviews for NICE guidelines summarise the evidence and its limitations so that the committee can interpret the evidence and make appropriate recommendations, even where there is uncertainty.
Evidence identified during literature searches and from other sources (see the chapter on identifying the evidence: literature searching and evidence submission ) should be reviewed against the review protocol to identify the most appropriate information to answer the review questions . The evidence review process used to inform guidelines must be explicit and transparent, and involves 7 main steps:
writing the review protocol (see the section on planning the evidence review in the chapter on developing review questions and planning the evidence review )
identifying and selecting relevant evidence (including a list of excluded studies with reasons for exclusion)
extracting relevant data
synthesising the results (including statistical analyses such as meta-analysis)
assessing quality/certainty in the evidence
interpreting the results.
Any substantial deviations from these steps need to be agreed, in advance, with NICE staff with responsibility for quality assurance .
The process of selecting relevant evidence is common to all evidence reviews; the other steps are discussed in relation to the main types of review questions. The same rigour should be applied to reviewing all data, whether fully or partially published studies or unpublished data supplied by stakeholders . Care should be taken to ensure that multiple reports of the same study are identified and ordered in full text to ensure that data extraction is as complete as possible, but study participants are not double counted in the analysis.
Titles and abstracts of the retrieved citations should be screened against the inclusion criteria defined in the review protocol, and those that do not meet these should be excluded. A percentage (at least 10%, but possibly more depending on the review question) should be screened independently by 2 reviewers (that is, titles and abstracts should be double-screened). The percentage of records to be double-screened for each review should be specified in the review protocol.
If reviewers disagree about a study's relevance, this should be resolved by discussion or by recourse to a third reviewer. If, after discussion, there is still doubt about whether or not the study meets the inclusion criteria, it should be retained. If double-screening is only done on a sample of the retrieved citations (for example, 10% of references), inter-rater reliability should be assessed against a pre-specified threshold (usually 90% agreement, unless another threshold has been agreed and documented). If agreement is lower than the pre-specified threshold, the reason should be explored and a course of action agreed to ensure a rigorous selection process. A further proportion of studies should then be double-screened to validate this new process until appropriate agreement is achieved.
Once the screening of titles and abstracts is complete, full versions of the selected studies should be obtained for assessment. As with title and abstract screening, a percentage of full studies should be checked independently by 2 reviewers, with any differences being resolved and additional studies being assessed by multiple reviewers if sufficient agreement is not achieved. Studies that fail to meet the inclusion criteria once the full version has been checked should be excluded at this stage.
The study selection process should be clearly documented and include full details of the inclusion and exclusion criteria . A flow chart should be used to summarise the number of papers included and excluded at each stage and this should be presented in the evidence review (see the PRISMA statement ). Each study excluded after checking the full version should be listed, along with the reason for its exclusion. Reasons for study exclusion need to be sufficiently detailed (for example, 'editorial/review' or 'study population did not meet that specified in the review protocol').
Priority screening refers to any technique that uses a machine learning algorithm to enhance the efficiency of screening. Usually, this involves taking information on previously included or excluded papers, and using this to order the unscreened papers from those most likely to be included to those least likely. This can be used to identify a higher proportion of relevant papers earlier in the screening process, or to set a cut‑off for manual screening, beyond which it is unlikely that additional relevant studies will be identified.
There is currently no published guidance on setting thresholds for stopping screening where priority screening has been used. Any methods used should be documented in the review protocol and agreed in advance with NICE staff with responsibility for quality assurance. Any thresholds set should, at minimum, consider the following:
the number of references identified so far through the search, and how this identification rate has changed over the review (for example, how many candidate papers were found in each 1,000 screened)
the overall number of studies expected, which may be based on a previous version of the guideline (if it is an update), published systematic reviews , or the experience of the guideline committee
the ratio of relevant/irrelevant records found at the random sampling stage (if undertaken) before priority screening.
The actual thresholds for used for each review question should be clearly documented, either in the guideline methods chapter or in the evidence reviews.
Ensuring relevant records are not missed
Regardless of the level of double-screening, and whether or not priority screening was used, additional checks should always be made to reduce the risk that relevant studies are not identified. These should include, at minimum:
checking reference lists of included systematic reviews, even if these reviews are not used as a source of primary data
checking with the guideline committee that they are not aware of any relevant studies that have been missed
looking for published papers associated with key trial registry entries or published protocols.
It may be useful to test the sensitivity of the search by checking that it picks up known studies of relevance.
Conference abstracts seldom contain enough information to allow confident judgements about the quality and results of a study, although they may be useful in interpreting evidence reviews. It can be difficult to trace the original studies or additional data, and the information found may not always be useful. Also, good-quality studies will often publish full text papers after the conference abstract, and these will be identified via routine searches. Conference abstracts should therefore not routinely be included in the search strategy, unless there are good rationales for doing so. For example, if surveillance has identified a number of ongoing studies and conference abstracts would be a good source for tracking the full studies, or if a call for evidence is planned and conference abstracts can be used to trace unpublished research. If conference abstracts are searched for, the investigators may be contacted if additional information is needed to complete the assessment for inclusion.
Legislation and policy
Relevant legislation or policies may be identified in the literature search and used to inform guidelines (such as drug safety updates from the Medicines and Healthcare products Regulatory Agency [MHRA]). Legislation and policy does not need quality assessment in the same way as other evidence, given the nature of the source. National policy or legislation can be quoted verbatim in the guideline (for example, Health and Social Care Act ), where needed.
Unpublished data and studies in progress
Any unpublished data should be quality assessed in the same way as published studies (see the section on assessing quality of evidence: critical appraisal, analysis, and certainty in the findings ). Ideally, if additional information is needed to complete the quality assessment, the investigators should be contacted. Similarly, if data from studies in progress are included, they should be quality assessed in the same way as published studies. Confidential information should be kept to a minimum, and a structured abstract of the study must be made available for public disclosure during consultation on the guideline.
Grey literature may be quality assessed in the same way as published literature, although because of its nature, such an assessment may be more difficult. Consideration should therefore be given to the elements of quality that are most likely to be important.
Assessing the quality of the evidence for a review question is critical. It requires a systematic process of assessing potential biases through considering both the appropriateness of the study design and the methods of the study (critical appraisal) as well as the certainty of the findings (using an approach, such as GRADE ).
Options for assessing the quality of the evidence should be considered by the developer . The chosen approach should be discussed and agreed with NICE staff with responsibility for quality assurance, where the approach deviates from the standard (described in critical appraisal of individual studies). The agreed approach should be documented in the review protocol (see the appendix on review protocol template ) together with the reasons for the choice. If additional information is needed to complete the data extraction or quality assessment, study investigators may be contacted.
Critical appraisal of individual studies
Every study should be appraised using a checklist appropriate for the study design (see the appendix on appraisal checklists, evidence tables, GRADE and economic profiles for checklists). If a checklist other than those listed is needed, or the one recommended as the preferred option is not used, the planned approach should be discussed and agreed with NICE staff with responsibility for quality assurance and documented in the review protocol.
Before starting the review, the criteria from the checklist (if not all) that are likely to be the most important indicators of biases for the review question should be agreed. These criteria will be useful in guiding decisions about the overall risk of bias of each individual study.
Sometimes, a decision might be made to exclude certain studies or to explore any impact of bias through sensitivity analysis . If so, the approach should be specified in the review protocol and agreed with NICE staff with responsibility for quality assurance.
Criteria relating to key areas of bias may also be useful when summarising and presenting the evidence (see the section on summarising evidence ). Topic-specific input (for example, from committee members) may be needed to identify the most appropriate criteria to define subgroup analyses, or to define inclusion in a review, for example, the minimum biopsy protocol for identifying the relevant population in cancer studies.
For each criterion that might be explored in sensitivity analysis, the decision on whether it has been met or not, and the information used to arrive at the decision, should be recorded in a standard template for inclusion in an evidence table (see the appendix on appraisal checklists, evidence tables, GRADE and economic profiles for examples of evidence tables).
Each study included in an evidence review should preferably be critically appraised by 1 reviewer and checked by another. Any differences in critical appraisal should be resolved by discussion or recourse to a third reviewer. Different strategies for critical appraisal may be used depending on the topic and the review question.
Characteristics of data should be extracted to a standard template for inclusion in an evidence table (see the appendix on appraisal checklists, evidence tables, GRADE and economic profiles ). Care should be taken to ensure that newly identified studies are cross-checked against existing studies to avoid double-counting. This is particularly important where there may be multiple reports of the same study.
If complex analysis is needed for a review question (for example complex network meta-analyses), data extraction should be checked by 2 reviewers to avoid data error, which is time-consuming to fix (see the DECiMAL guide to data extraction for complex meta-analyses ).
Analysing and presenting results for studies of interventions
Meta-analysis may be appropriate if treatment estimates of the same outcome from more than 1 study are available. Recognised approaches to meta-analysis should be used, as described in the handbook from Cochrane , the Centre for Reviews and Dissemination (2009), in Higgins et al. (2021) and documents developed by the NICE Guidelines Technical Support Unit .
There are several ways of summarising and illustrating the strength and direction of quantitative evidence about the effectiveness of an intervention, even if a meta-analysis is not done. Forest plots can be used to show effect estimates and confidence intervals for each study (when available, or when it is possible to calculate them). They can also be used to provide a graphical representation when it is not appropriate to do a meta-analysis and present a pooled estimate. However, the homogeneity of the outcomes and measures in the studies needs to be carefully considered: a forest plot needs data derived from the same (or justifiably similar) outcomes and measures.
Head‑to‑head data that compares the effectiveness of interventions is useful for a comparison between 2 active management options. Comparative studies are usually combined in a meta-analysis where appropriate. A network meta-analysis is an analysis that can include trials that compare the interventions of interest head-to-head and also trials that allow an indirect comparison via a common third intervention.
The same principles of good practice for evidence reviews and meta-analyses should be applied when conducting network meta-analyses. The reasons for identifying and selecting the randomised controlled trials (RCTs) should be explained, including the reasons for selecting the treatment comparisons. The methods of synthesis should be described clearly in the methods section of the evidence review.
When multiple competing options are being appraised, a network meta-analysis should be considered. The data from individual trials should also be documented (usually as an appendix). If there is doubt about the inclusion of particular trials (for example, because of concerns about limitations or applicability ), a sensitivity analysis in which these trials are excluded should also be presented. The level of consistency between the direct and indirect evidence on the interventions should be reported, including consideration of model fit and comparison statistics such as the total residual deviance, and the deviance information criterion (DIC). Results of further inconsistency tests, such as those based on node-splitting, should also be reported, if available. Results from direct comparisons may also be presented for comparison with the results from a network meta-analysis; this may be the results from the direct evidence within the network meta-analysis, or from direct pairwise comparisons done outside the network meta-analysis, depending on which is considered more informative.
When evidence is combined using network meta-analyses, trial randomisation should typically be preserved. If this is not appropriate, the planned approach should be discussed and agreed with NICE staff with responsibility for quality assurance. A comparison of the results from single treatment arms from different RCTs is not acceptable unless the data are treated as observational and analysed as such.
Further information on complex methods for evidence synthesis is provided by the documents developed by the NICE Guidelines Technical Support Unit .
To promote transparency of health research reporting (as endorsed by the EQUATOR network ), evidence from a network meta-analysis should usually be reported according to the criteria in the modified PRISMA‑NMA checklist in the appendix on network meta-analysis reporting standards .
Evidence from a network meta-analysis can be presented in a variety of ways. The network should be presented diagrammatically with the direct and indirect treatment comparisons clearly identified, and show the number of trials in each comparison. Further information on how to present the results of network meta-analyses is provided by the documents developed by the NICE Guidelines Technical Support Unit .
There are a number of approaches for assessing the quality, or confidence in outputs derived from network meta-analysis that have recently been published (Phillippo et al. 2019, Phillippo et al. 2017, Caldwell et al. 2016, Purhan et al. 2014, Salanti et al. 2014). The strengths and limitations of these approaches and their application to guideline development are currently being assessed, but none of these approaches are currently required in NICE guideline development.
Analysing and presenting results of studies of diagnostic test accuracy
Information on methods of presenting and synthesising results from studies of diagnostic test accuracy is being developed by the Cochrane Screening and Diagnostic Tests Methods Group and the GRADE working group . The quality of the evidence should be based on the critical appraisal criteria from QUADAS-2 (see the appendix on appraisal checklists, evidence tables, GRADE and economic profiles ). If meta-analysis is not possible or appropriate, there should be a narrative summary of the results that were considered most important for the review question.
Evidence on diagnostic test accuracy may be summarised in tables or presented as Receiver Operating Characteristic curves ( ROC curves ). Meta-analysis of results from a number of diagnostic accuracy studies can be complex. Relevant published technical advice (such as that from Cochrane) should be used to guide reviewers.
Analysing and presenting results of studies of prognosis, or prediction models for a diagnosis or prognosis
There is currently no general consensus on approaches for synthesising evidence from studies on prognosis, or prediction models for diagnosis or prognosis. A narrative summary of the quality of the evidence should be given, based on the quality appraisal criteria from the quality assessment tool used (for example, PROBAST [for prediction models], or QUIPS [for simple correlation/univariate regression analyses in prognostic studies], see the appendix on appraisal checklists, evidence tables, GRADE and economic profiles ). Characteristics of data should be extracted to a standard template for inclusion in an evidence table (see the appendix on appraisal checklists, evidence tables, GRADE and economic profiles). Methods for presenting syntheses of evidence on prognosis and prediction models are being developed by the GRADE working group .
Results may be presented as tables. Reviewers should be wary of using meta-analysis to summarise results unless the same prognostic factors or predictors and confounding factors have been examined across all studies and the same outcome measured. It is important to explore whether all likely confounding factors have been accounted for, and whether the metrics used to measure exposure (or outcome) are universal. When studies cannot be pooled, results should be presented consistently across studies (for example, the median and ranges of predictive values). For more information on prognostic reviews, see Collins 2015 and Moons 2015.
Analysing and presenting results of qualitative evidence
Qualitative evidence occurs in many forms and formats and so different methods may be used for synthesis and presentation (such as those described by the Cochrane Qualitative & Implementation Methods Group ). As with all data synthesis, it is important that the method used to evaluate the evidence is easy to follow. It should be written up in clear English and any analytical decisions should be clearly justified. Critical appraisal of qualitative evidence should be based on the criteria from the Critical Appraisal Skills Programme (CASP; see the appendix on appraisal checklists, evidence tables, GRADE and economic profiles ).
Qualitative evidence should be synthesised and then summarised in GRADE-CERQual. If synthesis of the evidence is not appropriate, a narrative summary may be adequate; this should be agreed with NICE staff with responsibility for quality assurance. The approach used depends on the volume and consistency of the evidence. If the qualitative evidence is extensive, then a recognised method of synthesis is preferable. If the evidence is more disparate and sparse, a narrative summary may be appropriate.
The simplest approach to synthesise qualitative data in a meaningful way is to analyse the themes (or 'meta' themes) in the evidence tables and write second level themes based on them. This 'second level' thematic analysis can be carried out if enough data are found, and the papers and research reports cover the same (or similar) findings or use similar methods. (These should be relevant to the review questions and could, for example, include intervention, age, population or setting.)
Synthesis can be carried out in a number of ways, and each may be appropriate depending on the question type, and the evidence identified. Papers reporting on the same findings can be grouped together to compare and contrast themes, focusing not just on consistency but also on any differences. The narrative should be based on these themes.
A more complex but useful approach is 'conceptual mapping' (see Johnson et al. 2000). This involves identifying the key themes and concepts across all the evidence tables and grouping them into first level (major), second level (associated) and third level (subthemes) themes. Results are presented in schematic form as a conceptual diagram and the narrative is based on the structure of the diagram.
Alternatively, themes can be identified and extracted directly from the data, using a grounded approach (Glaser and Strauss 1967). Other potential techniques include meta-ethnography (Noblit and Hare 1988) and meta-synthesis (Barroso and Powell-Cope 2000), but expertise in their use is needed.
Analysing and presenting results of mixed methods reviews
All qualitative evidence from a mixed methods review should be synthesised and then summarised in GRADE-CERQual. If appropriate, all quantitative data (for example, for intervention studies) should be presented using GRADE. An overall summary of how the quantitative and qualitative evidence are linked should be presented in either simple matrices or simple thematic diagrams.
Certainty or confidence in the findings of analysis
The certainty or confidence in the findings should be presented at outcome level using GRADE or GRADE-CERQual (for individual or synthesised studies). If this is not appropriate, the planned approach should be discussed and agreed with NICE staff with responsibility for quality assurance. It should be documented in the review protocol (see the appendix on review protocol template ) together with the reasons for the choice.
Certainty or confidence in the findings by outcome
Before starting an evidence review, the outcomes of interest which are important to people using services and the public for the purpose of decision-making should be identified. The reasons for prioritising outcomes should be documented in the evidence review. This should be done before starting the evidence review and clearly separated from discussion of the evidence, because there is potential to introduce bias if outcomes are selected when the results are known. An example of this would be choosing only outcomes for which there were statistically significant results.
The committee discussion section should also explain how the importance of outcomes was considered when discussing the evidence. For example, the committee may want to define prioritised outcomes into 'important' and 'critical'. Alternatively, they may think that all prioritised outcomes are crucial for decision making. In this case, there will be no distinction between 'critical' or 'important' for all prioritised outcomes. The impact of this on the final recommendations should be clear.
GRADE and GRADE-CERQual assess the certainty or confidence in the review findings by looking at features of the evidence found for each 'critical' and 'important' outcome or theme. GRADE is summarised in box 6.1, and GRADE-CERQual in box 6.2.
GRADE assesses the following features for the evidence found for each outcome:
study limitations (risk of bias) – the internal validity of the evidence
inconsistency – the heterogeneity or variability in the estimates of treatment effect across studies
indirectness – the extent of differences between the population, intervention, comparator for the intervention and outcome of interest across studies
imprecision – the extent to which confidence in the effect estimate is adequate to support a particular decision
other considerations – publication bias, the degree of selective publication of studies.
GRADE-CERQual assesses the following features for the evidence found for each finding:
methodological limitations – the internal validity of the evidence
relevance – the extent to which the evidence is applicable to the context in the review question
coherence – the extent of the similarities and differences within the evidence
adequacy of data – the extent of richness and quantity of the evidence.
The certainty or confidence of evidence is classified as high, moderate, low or very low. In the context of NICE guidelines, it can be interpreted as follows:
High – further research is very unlikely to change our recommendation.
Moderate – further research is likely to have an important impact on our confidence in the estimate of effect and may change the strength of our recommendation.
Low – further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the recommendation.
Very low – any estimate of effect is very uncertain and further research will probably change the recommendation.
The approach taken by NICE differs from the standard GRADE and GRADE-CERQual system in 2 ways:
it also integrates a review of the quality of cost-effectiveness studies (see the chapter on incorporating economic evaluation )
it does not use 'overall summary' labels for the quality of the evidence across all outcomes or for the strength of a recommendation, but uses the wording of recommendations to reflect the strength of the evidence (see the chapter on writing the guideline ).
In addition, although GRADE does not yet cover all types of review questions, GRADE principles can be applied and adapted to other types of questions. The GRADE working group continues to refine existing approaches and to develop new approaches. Developers should check the GRADE website for any new guidance or systems when developing the review protocol. Any substantial changes, made by the developer, to GRADE should be agreed with NICE staff with responsibility for quality assurance before use.
GRADE or GRADE-CERQual tables summarise the certainty in the evidence and data for each critical and each important outcome or theme and include a limited description of the certainty in the evidence. GRADE or GRADE-CERQual tables should be available (in an appendix) for each review questions.
NICE's equality and diversity duties are expressed in a single public sector equality duty ('the equality duty', see the section on key principles that guide the development of NICE guidance and standards in the introduction chapter ). The equality duty supports good decision-making by encouraging public bodies to understand how different people will be affected by their activities. For NICE, much of whose work involves developing advice for others on what to do, this includes thinking about how people will be affected by its recommendations when these are implemented (for example, by health and social care practitioners ).
In addition to meeting its legal obligations, NICE is committed to going beyond compliance, particularly in terms of tackling health inequalities . Specifically, NICE considers that it should also take account of socioeconomic factors and the circumstances of certain groups of people (such as looked-after children and people who are homeless). If possible, NICE's guidance aims to reduce and not increase identified health inequalities.
Ensuring inclusivity of the evidence review criteria
Any equalities criteria specified in the review protocol should be included in the evidence tables. At the data extraction stage, reviewers should refer to the PROGRESS-Plus criteria (including age, sex, sexual orientation, disability, ethnicity, religion, place of residence, occupation, education, socioeconomic position and social capital; Gough et al. 2012) and any other relevant protected characteristics, and record these where reported, as specified in the review protocol. Review inclusion and exclusion criteria should also take the relevant groups into account, as specified in the review protocol.
Equalities should be considered during the drafting of the reviews. Equality considerations should be included in the data extraction process and should be recorded in the committee discussion section if they were important for decision-making.
The following sections should be included in the evidence review:
an introduction to the evidence review
summary of the evidence identified, in either table or narrative format
evidence tables (usually presented in an appendix)
full GRADE or GRADE-CERQual profiles (in an appendix)
evidence statements (if GRADE [or a modified GRADE approach], or GRADE-CERQual is not used)
an overall summary of merged quantitative and qualitative evidence (either using matrices or thematic diagrams) for mixed methods reviews
results from other analysis of evidence, such as forest plots, area under the curve graphs, network meta-analysis (usually presented in an appendix; see the appendix on network meta-analysis reporting standards ).
The evidence should usually be presented separately for each review question; however, alternative methods of presentation may be needed for some evidence reviews (for example, where review questions are closely linked and need to be interpreted together). In these cases, the principles of quality assessment, and data extraction and presentation should still apply.
Any substantial deviations in presentation need to be agreed, in advance, with NICE staff with responsibility for quality assurance.
Summary of the evidence
A summary of the evidence identified should be produced. The content of this summary will depend on the type of question and the type of evidence. It should also identify and describe any gaps in the evidence.
Short summaries of the evidence should be included with the main findings. These should:
summarise the volume of information gleaned for the review question(s), that is, the number of studies identified, included, and excluded (with a link to a PRISMA selection flowchart, in an appendix)
summarise the study types, populations, interventions, settings or outcomes for each study related to a particular review question.
Evidence tables help to identify the similarities and differences between studies, including the key characteristics of the study population and interventions or outcome measures. This provides a basis for comparison.
Data from identified studies are extracted to standard templates for inclusion in evidence tables. The type of data and study information that should be included depends on the type of study and review question, and should be concise and consistently reported. The appendix on appraisal checklists, evidence tables, GRADE and economic profiles contains examples of evidence tables for quantitative studies (both experimental and observational).
The types of information that could be included are:
bibliography (authors, date)
study aim, study design (for example, RCT, case–control study ) and setting (for example, country)
funding details (if known)
population (for example, source and eligibility)
intervention, if applicable (for example, content, who delivers the intervention, duration, method, dose, mode or timing of delivery)
comparator, if applicable (for example, content, who delivers the intervention, duration, method, dose, mode or timing of delivery)
method of allocation to study groups (if applicable)
outcomes (for example, primary and secondary and whether measures were objective, subjective or otherwise validated)
key findings (for example, effect sizes, confidence intervals , for all relevant outcomes, and where appropriate, other information such as numbers needed to treat and considerations of heterogeneity if summarising a systematic review or meta-analysis)
inadequately reported data, missing data or if data have been imputed (include method of imputation or if transformation is used)
overall comments on quality, based on the critical appraisal and what checklist was used to make this assessment.
If data are not being used in any further statistical analysis, or are not reported in GRADE tables, effect sizes (point estimate) with confidence intervals should be reported, or back calculated from the published evidence where possible. If confidence intervals are not reported, exact p values (whether or not significant), with the test from which they were obtained, should be described. When confidence intervals or p values are inadequately reported or not given, this should be stated. Any descriptive statistics (including any mean values and degree of spread such as ranges) indicating the direction of the difference between intervention and comparator should be presented. If no further statistical information is available, this should be clearly stated.
The assessment of potential biases should also be presented. When study details are inadequately reported, or absent, this should be clearly stated.
The type of data that should be included in evidence tables for qualitative studies is shown in the example in the appendix on appraisal checklists, evidence tables, GRADE and economic profiles . This could include:
study aim, study design and setting (for example, country)
population or participants
theoretical perspective adopted (such as grounded theory)
key aims, objectives and research questions; methods (including analytical and data collection technique)
key themes/findings (including quotes from participants that illustrate these themes or findings, if appropriate)
gaps and limitations
Full GRADE or GRADE-CERQual tables that present both the results of the analysis and describe the confidence in the evidence should normally be provided (in an appendix).
If GRADE or GRADE-CERQual is not appropriate for the evidence review, evidence statements should be included. Examples of where evidence statements may be needed are review questions covering prognosis/clinical prediction models (where data cannot be pooled), review questions covering service delivery, or where formal consensus approaches have been taken to answer a review question.
Evidence statements should provide an aggregated summary of all of the relevant studies or analyses, regardless of their findings. They should reflect the balance of the evidence, and its strength (quality, quantity and consistency, and applicability). Evidence statements should summarise key aspects of the evidence but should also highlight where there is a lack of evidence (note that this is different to evidence for a lack of effect).
Evidence statements are structured and written to help committees formulate and prioritise recommendations. They can help committees decide:
whether or not there is sufficient evidence (in terms of strength and applicability) to form a judgement
whether (on balance) the evidence demonstrates that an intervention, approach or programme is effective or ineffective, or is inconclusive
the size of effect and associated measure of uncertainty
whether the evidence is applicable to people affected by the guideline and contexts covered by the guideline.
Structure and content of evidence statements
If evidence statements are presented, one or more evidence statements are prepared for each review question or subsidiary question. (Subsidiary questions may cover a type of intervention, specific population groups, a setting or an outcome.)
Each evidence statement should stand alone as an accessible, clear summary of key information used to support the recommendations (see the section on interpreting the evidence to make recommendations in the chapter on writing the guideline ). The guideline should ensure that the relationship between the recommendations and the supporting evidence statements is clear.
Evidence statements should identify the sources of evidence and their quality in brief descriptive terms and not just by symbols. Each statement should also include summary information about the:
content of the intervention, management strategy (for example, what, how, where?) and comparison, or factor of interest
populations, number of people analysed, and settings (for example, country)
outcomes, the direction of effect (or correlation) and the size of effect (or correlation) if applicable
strength of evidence (reflecting the appropriateness of the study design to answer the question and the quality, quantity and consistency of evidence)
applicability to the question, people affected by the guideline and setting (see the section on equality and diversity considerations in the chapter on reviewing research evidence ).
Note that the strength of the evidence is reported separately to the direction and size of the effects or correlations observed.
Where important, the evidence statement should also summarise information about:
whether the intervention has been delivered as it should be (fidelity of the intervention)
what affects the intervention achieving the outcome (mechanism of action).
An evidence statement indicating where no evidence is identified for a specific outcome should be included.
'Vote counting' (merely reporting on the number of studies) is not an acceptable summary of the evidence.
If appropriate, the direction of effect or association should be summarised using 1 of the following terms:
However, appropriate context/topic-specific terms (for example, 'an increase in HIV incidence', 'a reduction in injecting drug use' and 'smoking cessation') may be used.
These terms should be used consistently in each review and their definitions should be reported in the methods section.
Evidence statements for quantitative evidence
An example of an evidence statement from a prognostic review is given in box 6.3. The example has been adapted from the original and is for illustrative purposes only:
Association between communication and contraceptive use
There is moderate evidence from 3 UK cross-sectional studies (Kettle et al. 2007, Jarrett et al. 2007, Morgan et al. 2000; n=254), about the correlation between young people's communication skills around safer sex and a reduction in the number of teenage pregnancies. The evidence about the strength of this correlation is mixed. One study (Kettle et al. 2007) found that discussing condom use with new partners was associated with an increase in actual condom use at first sex (odds ratio [OR] 2.67 [95% confidence interval 1.55 to 4.57]). Another study (Morgan et al. 2000) found that not talking to a partner about protection before first sexual intercourse was associated with an increase in teenage pregnancy (OR 1.67 [1.03 to 2.72]). And, another study (Jarrett et al. 2007) found small positive correlations between condom use, discussions about safer sex (r=0.072, p<0.01) and communication skills (r=0.204, p<0.01).
AGREE Collaboration (2003) Development and validation of an international appraisal instrument for assessing the quality of clinical practice guidelines: the AGREE project . Quality and Safety in Health Care 12: 18–23
Altman DG (2001) Systematic reviews of evaluations of prognostic variables . British Medical Journal 323: 224–8
Barroso J, Powell-Cope GM (2000) Meta-synthesis of qualitative research on living with HIV infection . Qualitative Health Research 10: 340–53
Brouwers M, Kho ME, Browman GP et al. for the AGREE Next Steps Consortium (2010) AGREE II: advancing guideline development, reporting and evaluation in healthcare . Canadian Medical Association Journal 182: E839–42
Caldwell DM, Ades AE, Dias S et al. (2016) A threshold analysis assessed the credibility of conclusions from network meta-analysis . Journal of Clinical Epidemiology 80: 68–76
Centre for Reviews and Dissemination (2009) Systematic reviews: CRD's guidance for undertaking reviews in health care . University of York: Centre for Reviews and Dissemination
Collins GS, Reistma JB, Altman DG et al. (2015) Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) : The TRIPOD Statement. Annals of Internal Medicine 162: 55–63
Egger M, Davey Smith G, Altman DG (2000) Systematic reviews in health care: meta-analysis in context. London: British Medical Journal Books
Glaser BG, Strauss AL (1967) The discovery of grounded theory: strategies for qualitative research. New York: Aldine de Gruyter
Gough D, Oliver S, Thomas J, editors (2012) An introduction to systematic reviews. London: Sage
GRADE working group (2004) Grading quality of evidence and strength of recommendations . British Medical Journal 328: 1490–4
The GRADE series in the Journal of Clinical Epidemiology
Guyatt GH, Oxman AD, Schünemann HJ et al. (2011) GRADE guidelines: a new series of articles in the Journal of Clinical Epidemiology . Journal of Clinical Epidemiology 64: 380–2
Harbord RM, Deeks JJ, Egger M et al. (2007) A unification of models for meta-analysis of diagnostic accuracy studies. Biostatistics 8: 239–51
Higgins JPT, Thomas J, Chandler J et al., editors (2021) Cochrane handbook for systematic reviews of interventions, version 6.2
Johnson JA, Biegel DE, Shafran R (2000) Concept mapping in mental health: uses and adaptations. Evaluation and Programme Planning 23: 67–75
Moons KG, Altman DG, Reistma JB et al. (2015) Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): explanation and elaboration . Annals of Internal Medicine 126: W1–W73
NICE Decision Support Unit Evidence synthesis TSD series [online; accessed 31 August 2018]
Noblit G, Hare RD (1988) Meta-ethnography: synthesising qualitative studies. London: Sage
Pedder H, Sarri G, Keeney E et al. (2016) Data extraction for complex meta-analysis (DECiMAL) guide . BMC Systematic Reviews 5(1): 212
Phillippo DM, Dias S, Ades AE et al. (2017) Sensitivity of treatment recommendations to bias in network meta-analysis . Journal of the Royal Statistical Society; Series A
Phillippo DM, Dias S, Welton NJ et al. (2019) Threshold Analysis as an Alternative to GRADE for Assessing Confidence in Guideline Recommendations Based on Network Meta-analyses . Annals of Internal Medicine 170(8): 538-46
Puhan MA, Schünemann HJ, Murad MH et al. (2014) A GRADE working group approach for rating the quality of treatment effect estimates from network meta-analysis . British Medical Journal 349: g5630
Ring N, Jepson R and Ritchie K (2011) Methods of synthesizing qualitative research studies for health technology assessment . International Journal of Technology Assessment in Health Care 27: 384–390
Salanti G, Del Giovane C, Chaimani A et al. (2014) Evaluating the quality of evidence from a network meta-analysis . PloS one. 9(7): e99682
Tugwell P, Pettigrew M, Kristjansson E et al. (2010) Assessing equity in systematic reviews: realising the recommendations of the Commission on the Social Determinants of Health . British Medical Journal 341: 4739
Tugwell P, Knottnerus JA, McGowan J et al. (2018) Systematic Review Qualitative Methods Series reflect the increasing maturity in qualitative methods . Journal of Clinical Epidemiology 97: vii–viii
Turner RM, Spiegelhalter DJ, Smith GC et al. (2009) Bias modelling in evidence synthesis . Journal of the Royal Statistical Society, Series A (Statistics in Society) 172: 21–47
Whiting PF, Rutjes AWS, Westwood ME et al. and the QUADAS‑2 group (2011) QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies . Annals of Internal Medicine 155: 529–36
- Chamberlain University Library
- Chamberlain Library Core
Finding Types of Research
- Evidence-Based Research
On This Guide
Introduction, understand evidence-based practice, types of research studies, search the library.
- Quantitative Studies
- Qualitative Studies
- Systematic Reviews
- Randomized Controlled Trials
- Observational Studies
- Literature Reviews
Throughout your schooling, you may need to find different types of evidence and research to support your coursework. This guide provides a high-level overview of evidence-based practice as well as the different types of research and study designs. Each page of this guide offers an overview and search tips for finding articles that fit that study design.
Note! If you need help finding a specific type of study, visit the Get Research Help page to contact the librarians.
What is Evidence-Based Practice?
One of the requirements for your coursework is to find articles that support evidence-based practice. But what exactly is evidence-based practice?
"Evidence-based practice is based on a comprehensive review of research findings, which emphasize intervention, randomized clinical trials (the gold standard), an integration of statistical findings, and the making of critical decisions about the findings based on the strength of the evidence, the tools used in the studies, and the cost" (Godshall, 2010, as cited in Jennings, 2000; Jennings & Loan, 2001).
The video below explains all the steps of evidence-based practice in greater detail.
- Video - Evidence-based practice: What it is and what it is not. Medcom (Producer), & Cobb, D. (Director). (2017). Evidence-based practice: What it is and what it is not [Streaming Video]. United States of America: Producer. Retrieved from Alexander Street Press Nursing Education Collection
Determine Your Clinical Question and PICO(T) Elements
One of the steps in evidence-based practice is to come up with a clinical question that you want to study. Often, the clinical question is presented in the PICO(T) format, which will help you come up with keywords to use in your search for articles.
PICO(T) is an acronym for:
- Patient population
Note! If you need help constructing your clinical question or finding your PICO(T) elements, please reach out to your professor.
After you have developed your clinical question and have determined the parts of the PICO(T), then you are ready to search for articles in the library. In the next section, there is a description of the different types of studies you will find while searching for articles.
Quantitative and Qualitative Studies
Research is broken down into two different types: quantitative and qualitative. Quantitative studies are all about measurement. They will report statistics of things that can be physically measured like blood pressure, weight, and oxygen saturation. Qualitative studies, on the other hand, are about people's experiences and how they feel about something. This type of information cannot be measured using statistics. Both of these types of studies report original research and are considered single studies.
Some research study types that you will encounter include:
- Case-Control Studies
- Cohort Studies
- Cross-Sectional Studies
Studies that Synthesize Other Studies
Sometimes, a research study will look at the results of many studies and look for trends and draw conclusions. These types of studies include:
- Meta Analyses
Tip! How do you determine the research article's study type or level of evidence? First, look at the article abstract. Most of the time the abstract will have a methodology section, which should tell you what type of study design the researchers are using. If it is not in the abstract, look for the methodology section of the article. It should tell you all about what type of study the researcher is doing and the steps they used to carry out the study.
To answer your clinical question, you will have to search for relevant articles that report research. Remember, research is evidence. There are a few ways you can search for evidence in the library.
The best place to start your search is the Search Everything system on the library homepage. Use your (P)atient Population and (I)ntervention as the keywords for you search statement. You can then also try adding your outcome to your search statement to see if you can narrow your search down further. If you need help forming a search statement, review the Learn to Search guide.
You might also try other databases like Trip Pro for additional high-level research. Trip Pro has a PICO search form where you can enter your PICO terms and perform a search. Visit the database below to try it out!
- Database - TRIP Pro Database TripPro is a clinical search engine designed to allow users to quickly and easily find and use high-quality research evidence to support their practice and/or care. Users can also search across other content types including images, videos, patient information leaflets, educational courses and news.
Tip! You will most likely not find one article that will answer every aspect of your clinical question. Instead, you will have to find elements across multiple articles and put them together in your own analysis. If you are not finding enough evidence for your topic, please reach out to your professor.
Godshall, M. (2010). Fast facts for evidence-based practice : Implementing EBP in a nutshell . Springer Publishing Company.
- Search Website
- Library Tech Support
- Services for Colleagues
Chamberlain College of Nursing is owned and operated by Chamberlain University LLC. In certain states, Chamberlain operates as Chamberlain College of Nursing pending state authorization for Chamberlain University.
There are five levels of evidence in the hierarchy of evidence – being 1 (or in some cases A) for strong and high-quality evidence and 5 (or E) for evidence with effectiveness not established, as you can see in the pyramidal scheme below: Level of evidence hierarchy
There are two types of evidence. First hand research is research you have conducted yourself such as interviews, experiments, surveys, or personal experience and anecdotes. PARTNER CONTENT Upload your paper & get a free Expert Check Get a real writing expert to proofread your paper before you turn it in Check my paper
Evidence Levels: Level I: Experimental, randomized controlled trial (RCT), systematic review RTCs with or without meta-analysis Level II: Quasi-experimental studies, systematic review of a combination of RCTs and quasi-experimental studies, or quasi-experimental studies only, with or without meta-analysis
One way to organize the different types of evidence involved in evidence-based practice research is the levels of evidence pyramid. The pyramid includes a variety of evidence types and levels. Filtered resources. systematic reviews. critically-appraised topics. critically-appraised individual articles. Unfiltered resources.
According to the Johns Hopkins hierarchy of evidence, the highest level of evidence is an RCT, a systematic review of RCTs, or a meta-analysis of RCTs. 7 In an RCT, the study must meet three criteria: random or “by chance” assignment of participants into two or more groups, an intervention or treatment applied to at least one of the groups, and a …
6 Reviewing research evidence. Reviewing evidence is an explicit, systematic and transparent process that can be applied to both quantitative (experimental and observational) and qualitative evidence (see the chapter on developing review questions and planning the evidence review ). The key aim of any review is to provide a summary of the ...
Types of Research Studies Quantitative and Qualitative Studies Research is broken down into two different types: quantitative and qualitative. Quantitative studies are all about measurement. They will report statistics of things that can be physically measured like blood pressure, weight, and oxygen saturation.
Evidence is the result or product of scientific research that enables decision-making. It can be divided into two main categories: Primary information (unfiltered) – It contains the original data or results and analysis from the scientific study. It includes no interpretation or external evaluation.