<1129> NUCLEIC ACID-BASED TECHNIQUES-GENOTYPING

1129

NUCLEIC ACID-BASED TECHNIQUES—GENOTYPING

INTRODUCTION

This chapter outlines techniques for detecting single-base DNA differences and other types of polymorphic DNA sequences that occur in the three billion bases that make up the human genome. The most common genetic variation is a single nucleotide polymorphism (SNP), which is a simple change in one base of the gene sequence. SNPs occur on average every 1000 bases and account for a significant amount of inter-individual variability. SNPs can predispose individuals to disease or influence their response to a drug. Approximately 1.8 million human SNP loci have been identified, and more are likely to be discovered in the coming years.1

Common approaches for detecting SNPs and other types of polymorphic DNA sequences are described in the following sections. These approaches encompass a variety of techniques, such as nucleic acid amplification techniques (NAT), real-time NAT, and microarrays, the principles of which are covered in more detail in related chapters. This chapter focuses on the specific modifications of the techniques that are necessary to enable detection of single base differences.

SNP Genotyping Technologies

Although the usefulness of studying SNPs for gene mapping and disease association studies is apparent, a single standardized procedure for SNP genotyping has not been adopted. Various approaches for performing SNP genotyping have been developed to meet a wide range of needs, including throughput capacity, ease of assay design, accuracy, and reliability. Available procedures can also be divided according to whether they are based on identifying known SNPs or whether they can be used to screen for unknown SNPs. To identify the most appropriate SNP genotyping procedure for a specific application, the throughput requirements in terms of the number of SNPs to be analyzed per sample (multiplexing level) and the sample throughput need to be determined because different approaches may work best depending on these requirements.

Most procedures used for genotyping SNPs depend on polymerase chain reaction (PCR) amplification of the genomic regions that span the SNPs followed by the actual genotyping reaction. PCR provides the required sensitivity and specificity for distinguishing between heterozygous and homozygous genotypes in large, complex genomes. The difficulty of designing and carrying out multiplex PCR reactions limits the throughput of many of the current SNP genotyping assays. The following sections outline several of the major approaches currently in use for SNP genotyping. In many cases the underlying technology can be modified to meet the specific application requirements in terms of sample throughput and number of SNPs detected. In general, real-time PCR-based procedures are better suited to higher sample numbers, and array-based procedures are better suited to the simultaneous detection of many SNPs. Newer technologies based on multiplexed array formats are also emerging and will be suitable for high sample numbers and many SNP applications.

Sequencing

Sequencing is the definitive procedure for DNA analysis, and its use for SNP detection allows unambiguous identification of base changes (see Nucleic Acid-Based Techniques—Extraction, Detection, and Sequencing

1126

for nucleic acid sequencing). The standard technology is expensive, and the procedure is time consuming and labor intensive and suffers from low sample throughput. Sequencing is a useful confirmatory tool, and it has applications in situations when other technologies are not appropriate, but is not the most cost-effective solution for the majority of SNP genotyping applications that require the identification of only one or a few bases.

Restriction Fragment Length Polymorphism Analysis

The first widely used procedure for the detection of polymorphisms exploited alterations in restriction enzyme sites caused by SNPs, leading to the gain or loss of cutting events. PCR–restriction fragment length polymorphism (RFLP) analysis comprises PCR amplification of a fragment of interest and subsequent digestion with a restriction enzyme. The fragments produced are typically analyzed by a size fractionation procedure, usually gel electrophoresis. Because of its simplicity, the procedure has been and still is extensively used, although it entails certain limitations: only a subset of polymorphisms that reside in an endonuclease restriction site can be studied with the conventional procedure; incomplete digestion due to suboptimal processing can produce misleading digestion patterns; and the procedure is less amenable to automation than are other SNP genotyping procedures.

Probe Hybridization

The basis of many SNP genotyping procedures are DNA hybridizations that make use of the stronger binding of a DNA probe to a perfectly matched complementary target than to a target that contains a single base mismatch. The ability of hybridization with allele-specific oligonucleotides (ASO) to detect a single base mismatch was first shown in the late 1970s and subsequently was used to detect the sickle-cell mutation in the beta-globin gene by Southern blot hybridization. The invention of PCR facilitated the further development of probe-based assays for genotyping SNPs in complex genomes.

The thermal stability of a hybrid between an ASO probe and its SNP-containing target sequence is not only determined by the stringency of the reaction conditions but also by the secondary structure of the target sequence and the nucleotide sequence flanking the SNP. Therefore, prediction a priori of the reaction conditions or the sequence of the ASO probe that will allow optimal discrimination between two alleles using ASO hybridization is difficult. These parameters should be established empirically and separately for each SNP. Consequently, there is no single set of reaction conditions that would be optimal for genotyping all SNPs, which makes the design of multiplex assays based on hybridization with ASO probes an extremely difficult task.

One approach to counter the problem of assay design is to carry out multiplex ASO hybridization reactions on arrays that carry multiple probes for each SNP that will be analyzed. This involves using probe sets in which the SNP occurs at different positions along the probes. It becomes feasible to include large numbers of ASO probes per SNP when one uses high-density arrays that can carry as many as 106 probes per cm2.

Another approach is to use base analogues such as locked nucleic acid (LNA), which is described in detail in Nucleic Acid-Based Techniques—Amplification

1127

. For applications that involve few SNPs but many samples, homogeneous real-time PCR approaches have been developed. These include the use of fluorescent probe chemistries such as hydrolysis probes, stem-loop probes, and FRET (fluorescence resonance energy transfer) hybridization probes. The principle of these assays is discussed in more detail in Nucleic Acid-Based Techniques—Amplification

1127

. For SNP detection, the basis of many assays is the selective binding of the ASO probe to its perfectly matched target sequence, resulting in energy transfer and generation of a fluorescence signal. Probes designed with specific secondary structures tend to form a stem–loop structure that destabilizes mismatched hybrids, increasing their power of allele distinction as compared with that of linear ASO probes. Hydrolysis probes modified with minor groove-binder molecules that increase target affinity show improved powers of allele discrimination. The use of two probes, each labeled with a different reporter fluorophore, allows both SNP alleles to be detected in a single tube. Limited multiplexing can be achieved by using probes labeled with different fluorophores. In the fluorescent probe–based assays, the increase in fluorescence due to accumulating PCR product is usually monitored in real time in 96-well or 384-well microtiter plates. Alternatively, the fluorescence generated from the two alleles can be measured after completion of the PCR. In this case the results are expressed as a signal ratio that reflects the hybridization of the two oligonucleotides to the target sequence, and so differences in amplification efficiency between samples do not affect interpretation of the genotyping results.

A third approach involves heating the reaction after PCR has been completed in order to disassociate the probe from the target. Each duplex has its own specific Tm, which is defined as the temperature at which 50% of the DNA becomes single stranded. The Tm depends on the stability of the probe–target duplex. Perfectly matched probe–target duplexes have a greater stability and hence a higher Tm than does the same duplex containing a single base mismatch. By continuously monitoring the fluorescence during the heating phase, analysts generate a “melt curve” that measures the changes in fluorescence that result when the probe denatures, or “melts,” away from the amplicon. This approach can be used only for systems that do not rely on hydrolysis of the probe to generate a signal and is therefore not suitable for hydrolysis probe assays.

Because no post-PCR processing or label-separation steps are required, homogeneous real-time PCR assays are simple to perform, making them useful for high-throughput genotyping applications. The optimal probes must be designed individually for each SNP, and the assays are therefore most efficient when a limited number of SNPs is analyzed. The cost of probes modified with fluorescent and quenching moieties may also be a limiting factor in the high-throughput application of the assays.

Primer Extension

In this technique, an oligonucleotide is used to prime DNA synthesis by a polymerase enzyme, as performed in a standard PCR or sequencing reaction. Variations of the technique exist. Allele-specific PCR uses two primers, each fully complementary to one of the SNP alleles, with the SNP position being at the 3¢ end of the primer, and with a common reverse PCR primer to selectively amplify the SNP alleles. Because only perfectly matched oligonucleotides will prime DNA polymerase extension, product will be detected only from the reaction containing the perfectly matched primer.

Agarose gel electrophoresis is used to detect the amplified products, although homogeneous, real-time, allele-specific PCR approaches have also been developed using primers labeled with different fluorophores or a fluorescent dye that intercalates with the double-stranded PCR products or by performing amplicon detection using probes such as hydrolysis and hairpin (stem–loop) probes. When using intercalating dyes or labeled allele-specific PCR primers without a consecutive target-specific detection reaction or size-separation step, one may find that the specificity of the procedure may be compromised owing to primer–dimers and other spurious amplification products that will not be distinguished from the actual PCR product. A limitation of all variants of allele-specific PCR is that the reaction conditions or primer design for selective allele amplification must be optimized empirically for each SNP. Like the hydrolysis and hairpin probe assays, the homogeneous allele-specific PCR procedures are best suited for the analysis of a limited number of SNPs in large sample collections. Array-based approaches for greater SNP multiplexing have also been developed.

In procedures based on single nucleotide primer extension (sometimes known as minisequencing), allele discrimination is based on the high accuracy of nucleotide incorporation by DNA polymerase. A primer is used, and its 3¢ end is positioned on the base just preceding the SNP to be tested. The DNA polymerase is then used to incorporate labeled ddNTPs, each labeled with different fluorescent dyes. After the labeled oligonucleotides are separated from the nonincorporated ddNTPs, the results can be scored on a fluorescence plate reader. In addition to fluorescent tags, ddNTPs may be labeled with biotin or haptens and then detected indirectly through antibodies conjugated to alkaline phosphatase or peroxidase using colorimetric or chemiluminiscent markers in ELISA formats.

Multiplexing of this procedure has also been described to reduce costs and improve throughput. In these procedures, the different loci genotyped simultaneously are separated either by gel electrophoresis or by hybridization to arrayed tags. Primer extension directly on a solid support such as a microarray is also possible. The immobilization of the single-stranded primers on the solid support may be through biotin–avidin–streptavidin reaction or covalently via 5¢ disulfide groups.

Mass spectrometry using techniques such as matrix-assisted laser desorption–ionization time-of-flight mass spectrometry (MALDI-TOF MS) can also be used to determine the identity of the ddNTP incorporated based on mass. A difficulty with MALDI-TOF MS is that the primer extension products must be rigorously purified before measurement to avoid background from biological material present in the sample. Such enzyme-assisted procedures have proven to be more robust and to provide more specific allele discrimination than does ASO hybridization at similar reaction conditions. These features are advantageous for high-throughput applications because the effort required for assay design and optimization is minimized.

Ligation

In the oligonucleotide ligation assay (OLA), oligonucleotides are designed so that they meet at the position of the SNP to be tested. Enzymatic joining, using a DNA ligase, occurs only when the match is perfect. The test is usually performed by designing two oligonucleotides specific for each allele and labeled differently on one side of the SNP, and one common oligonucleotide on the other. Detection of the alleles can be performed directly in the microplate wells by colorimetric approaches. Multiplexing and the use of gel separation have also been described.

OLA has also been used in microarray formats with one of the ligation probes immobilized or with immobilized single stem–loop probes. Alternatively, ligation can be carried out in solution followed by capture of the ligation products on microarrays or on microparticles that carry a generic set of oligonucleotides that are complementary to a “tag” sequence on one of the ligation probes. In practice, thermostable ligases are frequently used for genotyping SNPs in combination with PCR before allele-specific ligase detection reactions. Because the reaction mechanisms for PCR and ligation are different, the reagents for both reactions can be combined. This feature is used in a homogeneous, real-time PCR assay with ligase-mediated genotyping and detection by FRET. Compared with DNA-polymerase-assisted primer extension procedures, a drawback of the OLAs is that detection of each SNP requires three oligonucleotides, which increases the costs of these assays.

Padlock probes are linear oligonucleotides, the ends of which are complementary to the target and have a central stretch of random sequence. When perfectly hybridized to their target sequence, padlock probes can be circularized by ligation, whereas a mismatch with the target sequence prevents ligation. Circularized oligonucleotides can act as templates for DNA-polymerase-assisted rolling circle amplification (RCA). RCA can be used to amplify the ligated circularized padlock probes to a level required for detecting single-copy sequences. A homogeneous, isothermal assay for genotyping individual SNPs in a microtiter plate format has been devised by combining exponential amplification of ligated padlock probes using a branched rolling circle amplification reaction with detection by energy-transfer-labeled hairpin primers.

Displacement

The invader assay uses the property of flap endonucleases (FENs) for removing redundant portions (flap) from the 5¢ end of a downstream DNA fragment overlapping an upstream (invader) DNA fragment. An invader oligonucleotide is designed with its 3¢ end on the SNP to be tested. Two oligonucleotide signal probes are also designed, overlapping the polymorphic site and each corresponding to one of the alleles. After displacement of the signal probes by the invader probe, FEN-mediated cleavage occurs only for the perfectly matched allele-specific signal probe. Generation of the cleaved fragment is monitored by using it in a second reaction as an invader probe to cleave a FRET probe. This assay does not require PCR amplification of the locus to be tested, and scoring can be done using a simple fluorescence plate reader.

Pyrosequencing

In the pyrosequencing procedure, primer extension is monitored by enzyme-mediated luminometric detection of pyrophosphate (PPi), which is released on incorporation of deoxynucleotide triphosphates. The genotype of an SNP is deduced by sequential addition and degradation of the four nucleotides using apyrase in a dedicated instrument that operates in a 96-well or 384-well microtiter plate format. Using pyrosequencing, the apparatus can determine short 30 to 50 bp sequences of DNA that flank an SNP. A limitation of the procedure is that the sequential identification of bases prevents genotyping of several SNPs per reaction in diploid genomes. An advantage of the procedure is that any new polymorphism will be detected. However, specific equipment is needed for the injection of the nucleotides.

Single-Strand Conformation Polymorphism and Heteroduplex Analysis

Single-strand conformation polymorphism (SSCP) and heteroduplex analysis were among the first procedures established for the detection of SNPs. Conventional SSCP analysis involves denaturing PCR-amplified fragments and subsequent formation of sequence-specific secondary and tertiary structures of the single strands during nondenaturing gel electrophoresis. The electrophoretic mobility then depends on the 3-D shape of the single-stranded molecules. One single base difference in DNA fragments of up to 300 bp will usually change the conformation in a way that can be detected by nondenaturing PAGE.

The traditional polyacrylamide gels and 32P-labeled fragments are frequently being replaced by fluorescently labeled fragments and automated capillary electrophoresis. The simplicity of the procedure, combined with automation and short analysis time, contribute to high-throughput analysis at relatively low cost. If the denatured PCR products are allowed to slowly re-nature, they form DNA duplexes. The duplexes with the same sequence on both strands (homoduplexes) or with a single base pair mismatch on one strand (heteroduplexes) have different electrophoretic mobility in a native gel. In the case of a single base pair substitution, the heteroduplex can easily be separated from a homoduplex.

In other versions of the technique, denaturing high-performance liquid chromatography (DHPLC) is used for the separation of the heteroduplex and homoduplex strands. The mutation analysis with DHPLC can be almost totally automated with an autosampler on one end and a fraction collector on the other. Analysis is rapid (about 5 minutes per sample), and simple evaluation of data distinguishes between simple and multiple peaks in the elution profiles, allowing lengths as large as 1.5 kb of DNA to be analyzed. A disadvantage may be the recommended use of Pfu DNA polymerase, which, as a high-fidelity enzyme, allows sharper peaks but may be less successful in amplifying some regions.

Short Tandem Repeat Profiling

A short tandem repeat (STR) is a type of DNA polymorphism that occurs when a pattern of two or more nucleotides is repeated and the repeated sequences are directly adjacent to each other. The pattern can range in length from 2 to 10 bp (e.g., CATGn in a genomic region) and is typically in the noncoding intronic, or upstream/downstream regions. By examining several STR loci and counting how many repeats of a specific STR sequence there are at a given locus, one can create a unique genetic profile of an individual. Currently more than 10,000 STR sequences in the human genome have been published. STR analysis has become the prevalent analysis procedure for determining genetic profiles in forensic cases. STR analysis in the field of forensics came into popularity in the mid to late 1990s. The STRs in use for forensic analysis are tetra- or penta-nucleotide repeats (4 or 5 repeat units) because these give a high degree of error-free data while being robust enough to survive degradation in nonideal conditions. Shorter repeat sequences tend to suffer from artifacts such as stutter and preferential amplification; several genetic diseases are associated with tri-nucleotide repeats, including Huntington's disease. Longer repeat sequences suffer more highly from environmental degradation and do not amplify by PCR as well as do shorter sequences.

The analysis is performed by extracting nuclear DNA from the cells of a forensic sample of interest and then PCR amplifying specific polymorphic regions of the extracted sample. Once these sequences have been amplified, they are resolved either by gel electrophoresis or capillary electrophoresis, which allow the analyst to enumerate the repeats of the STR sequence in question. If the DNA is resolved by gel electrophoresis, the DNA can be visualized either by silver staining or an intercalating dye such as ethidium bromide or, as in most modern forensics labs, by fluorescent dyes. Instruments built to resolve STR fragments by capillary electrophoresis also use fluorescent dyes. In the United States, 13 core STR loci have been selected as the basis by which an individual genetic profile can be generated. These profiles are stored in local, state, and national DNA databanks such as the Combined DNA Index System (CODIS).

Forensic reference materials are available. The DNA Profiling Standard is composed of well-characterized human DNA in two forms: genomic DNA and DNA to be extracted from cells spotted onto filter paper.

ASSAY VALIDATION CONSIDERATIONS

The difficulty in reproducing and validating existing and emerging SNP genotyping assays due to factors such as variation in performance of PCR thermal cyclers, efficiency of different enzymes, personnel, and the presence of PCR inhibitors in the sample matrix (discussed in more detail in Nucleic Acid-Based Techniques—Amplification

1127

for general NAT assays) can hamper appropriate implementation of the technologies. Also, in the clinical laboratory the use of in-house assay formats often makes comparisons between laboratories difficult. Incorrect diagnosis of a genetic mutation can have significant consequences, so accuracy of 99.99% or higher is essential for such assays. To determine the accuracy of a technology, the new procedure should be validated on multiple samples in which the genotype has been previously determined with a gold standard procedure, such as sequencing. Even with the most accurate procedure of analysis, sample preparation and amplification and detection procedures must be optimized to eliminate any potential inaccuracies.

Some genotyping errors can be minimized by careful planning of the laboratory procedures, the inclusion of well-defined controls, and increased automation. However, errors due to the processes used for genotyping are sometimes difficult to overcome and need to be taken into account. The types of errors and the frequency with which they occur differ between different approaches. Situations in which preferential amplification of one allele or nonspecific probe hybridization occur can all result in SNP miscalls. Additional unanticipated polymorphisms present within the primer/probe sequences can lead to amplification bias, highlighting the need for careful assay design and validation using alternative techniques. Limited and degraded samples can also result in preferential allelic amplification due to chance PCR priming events at low copy number.

It is preferable to have a no-call result, which would require the test to be repeated, than a miscall that provides incorrect results that are subsequently reported. Performance of replicate assays may also help to ensure accuracy. Data interpretation can also affect accuracy. Wild-type, heterozygous, and homozygous mutant results should be clearly distinguished from one another, and a well-defined measure of uncertainty should be attributed to them. Proficiency testing schemes and ring-trials go some way toward ensuring that individual assays are fit for the purpose for which they are intended for specific applications and that the staff performing them are competent. Sharing of technical information for assay design and sample preparation will also help. The availability of reference panels of well-characterized samples aids assay design and evaluation and allows sound interlaboratory comparisons to be made.

1 Database of Single Nucleotide Polymorphism (db SNP) Build 128 is available from National Center for Biotechnology Information (NCBI), http://www.ncbi.nlm.nih.gov/projects/SNP/index.html.

Auxiliary Information— Please check for your question in the FAQs before contacting USP.

Topic/Question	Contact	Expert Committee
General Chapter	Anita Y. Szajek, Ph.D. Senior Scientist 1-301-816-8325	(BBVV05) Biologics and Biotechnology - Vaccines and Virology

USP32–NF27 Page 651

Pharmacopeial Forum: Volume No. 33(5) Page 1019