Increasingly, diagnostic laboratories are turning to next-generation sequencing (NGS) to detect heritable pathogenic mutations and somatic mutations that help to characterize and identify appropriate cancer treatments. As with any lab-developed test (LDT), NGS tests need to be validated in accordance with oversight by the College of American Pathologists.
Validation for NGS assays differs from that of many other diagnostic
workflows as there is a significant bioinformatics data analysis
component to NGS. In this post, we outline best practices for NGS test
validation based on the recommendations developed by a working group convened by the Association of Molecular Pathology (AMP).
Every validation process should start with a very specific “statement of
use” that will aid in determining the samples that should be used for
validation. Depending on the intended use, the set of sample types used
will vary in variant types and allele burden. For instance, tests for
somatic mutations will need to incorporate samples with a broad range of
variant allele frequencies, and tests intended to capture
insertions/deletions (indels), single nucleotide polymorphisms (SNPs),
copy number variants (CNVs) and fusions will need to include multiple
samples of each type.
The assortment of sample types tested should be representative of the
sample types that are expected in the clinical service; however,
problematic sample types, such as those with low percentage tumor
composition or samples that may have incurred damage or instability
should be especially thoroughly characterized during validation, even if
they will not be encountered frequently in a clinical setting. In
total, no fewer than 59 samples should be used to establish validation.
It is important to note that in order to obtain 95% confidence that your
product’s passing rate is at least 95% – commonly summarized as
“95/95”, 59 samples must be tested and must pass the test.
Samples used for validation should have been previously characterized,
and should also reflect the specimen type (blood, FFPE, etc.) for the
intended use. AMP recommends including at least two samples for which a
consensus sequence across all gene regions of interest has been
Positive predictive accuracy and positive predictive values
For each class of variant to be tested (SNP, indel, etc.), a positive
percentage agreement (PPA) and positive predictive value (PPV) will need
to be determined. PPA communicates the percentage of known variants
detected by the rest, reflecting the accuracy of the test. Similarly,
PPV reflects the percentage of called variants that are true positives.
PPA and PPV can be determined either using positive reference samples
(samples with known status) or a reference method (an established
detection method and unknown samples) when at least 59 reference samples
are not available.
Repeatability & reproducibility
There are several potential sources of reproducibility error that can
occur in NGS testing, and validation experiments should be designed to
address each. These sources include variations in instrumentation,
laboratory technicians, and reagent lots. To determine reproducibility,
at least three samples should be tested across each potential
variability source (i.e., performed by multiple laboratory personnel on
multiple instruments with different reagent lots). In addition to these
between-run, repeatability tests, duplicate, within-run tests should be
performed without any anticipated source of variability to determine
Variability between the outcome of each test should be quantified
across multiple steps of the NGS workflow. For example, variability in
nucleic acid yield following extraction, library prep quality control
metrics, sequencing read outcome metrics, and final variant calls should
all be noted.
Limits of detection
AMP’s recommends that the lower limit of detection (LLOD) is defined as
the lowest allele fraction at which the allele will be reliably detected
for 95% of samples. In order to achieve 95% confidence, mathematically,
at least 59 samples must be tested. If it is not possible to test 59
samples, additional controls designed to determine sensitivity should be
Interfering substances and carryover
NGS tests are potentially sensitive to interference by reagents or
biological materials that are not effectively removed during the nucleic
acid extraction process. Potential interfering substances might include
fixatives such as heavy metals, cellular components like melanin or
hemoglobin, or interfering nucleic acids—for instance RNA for a
DNA-based NGS assay.
This so-called carryover issue should be addressed during validation,
particularly for detecting variants that are expected to exhibit a low
allele burden. Carryover testing can be performed by comparing samples
to standards and through the inclusion of no template controls (NTCs).
As bioinformatic approaches can also be used to identify carryover
issues—for instance, by detecting known human interfering
substances—bioinformatics is an essential part of the interfering
substances and carryover validation process.
All possible types of interfering substances should be evaluated
systematically throughout the validation process, and carryover
monitored and quantified at each step of the NGS workflow. AMP suggests
including NTCs in every sequencing run, though the controls need not be
evaluated throughout the entire workflow.
Validation of the bioinformatics pipeline
The bioinformatics pipeline is critical for assay performance and
intimately tied to the entire NGS workflow. For instance, the required
coverage depth for variant detection will vary depending on the
The bioinformatics pipeline should be validated based on
methods-based paradigms using well-characterized cell lines that reflect
the variant population and variant allele frequency anticipated in the
clinical service. If appropriate physical cell line samples cannot be
obtained for validation, it is also possible to validate the
bioinformatics pipeline in silico using sequence files generated from well-characterized samples.
Our role in the validation process
Fabric Genomics provides artificial-intelligence (AI) driven data analysis solutions for clinical NGS workflows. Our industry-leading AI takes the guesswork out of the data analysis process and decreases variability while delivering superior sensitivity. Because bioinformatics analysis is an integral part of the validation process and ongoing testing, we developed this guide to help get you started in planning your own NGS validation project. We are here to support you every step of the way, so if you want help planning your validation experiments, please get in touch.