The preferred source of DNA in human genetics research is blood, or cell lines derived from blood, as these sources yield large quantities of high quality DNA. However, DNA extraction from saliva can yield high quality DNA with little to no degradation/fragmentation that is suitable for a variety of DNA assays without the expense of a phlebotomist and can even be acquired through the mail. However, at present, no saliva DNA collection/extraction protocols for next generation sequencing have been presented in the literature. This protocol optimizes parameters of saliva collection/storage and DNA extraction to be of sufficient quality and quantity for DNA assays with the highest standards, including microarray genotyping and next generation sequencing.
Keywords: Medicine, Issue 90, DNA collection, saliva, DNA extraction, Next generation sequencing, DNA purification, DNA
Download video file. (24M, mp4)Obtaining high quality DNA for human genetic studies is essential in the disease gene discovery process. Blood, though requiring an invasive procedure and also being more expensive than saliva collection, is favored for creating immortalized cell lines as an infinite source of DNA, or iPSCs for functional studies, and sometimes blood DNA is used when cell lines are not available. However, obtaining blood requires a trained phlebotomist and blood has a shorter half-life than saliva 1 . DNA from saliva is less expensive and easier to obtain, since it can be collected and sent through the mail without the need for a phlebotomist, thereby increasing potential subject pools well beyond the catchment area of hospitals and laboratories 2 . Study enrollment may be improved when subjects have the option of giving a saliva sample instead of blood 3, 4 . Concerns about the quantity and quality of DNA from saliva may have limited its widespread use despite numerous studies recent studies showing the suitability of whole saliva, with an average of 4.3 x 10 5 cells per milliliter, for DNA testing over the older buccal swabs methods that did not obtain significant amounts of saliva 2, 3, 4, 5, 6 . While a modest literature exists showing the suitability of whole saliva derived DNA for genotyping applications including microarray-based methods 8, 9, 10 , no studies have examined next generation sequencing (NGS). The goal for optimizing this whole saliva DNA extraction protocol was to maximize quantity and quality for genetics applications in a cost effective way that is easily implemented in laboratories with common reagents and consumables.
DNA extraction from saliva requires several procedures: 1) collection and storage, 2) cell lysis, 3) RNase treatment, 4) protein precipitation, 5) ethanol precipitation, 6) DNA rehydration. The DNA Stabilization Buffer solution, described previously 2 , functions adequately without alteration. No attempt to optimize the RNase treatment and DNA rehydration steps was made. For each remaining step, several variables that could affect yield were identified. Each variable was manipulated individually and improvement in yield and quality was assessed statistically. For variables that were shown to improve yield and/or DNA quality, the optimal values were included in the final protocol.
NOTE: Prior to providing saliva samples all subjects gave informed consent conforming to the guidelines for treatment of human subjects at Nationwide Children’s Hospital.
Prior to saliva collection, ensure that the subject’s mouth is free of food or other foreign substances by having the subject rinse their mouth with water and avoiding eating or drinking for 30 min before collecting the sample.
Open a 15 ml centrifuge tube with 2.5 ml of DNA stabilization buffer 2 making sure to avoid touching the inside of the cap or tube, and have the subject spit 2.5 ml of saliva into the buffer solution. Note: Collecting more than 2.5 ml of saliva can lead to sample degradation from an insufficient ratio of sample to DNA stabilization buffer. Collecting too little saliva will reduce expected yields from the protocol. To evaluate collection volumes, use the numbered gradients on the side of the tube.
Replace cap and mix by inversion until the mixture is homogenized. Vigorous shaking is not necessary. Store the samples at RT for short-term storage or 4 °C for long-term storage (>3 months).
Prior to starting the extraction, heat a water bath to 37 °C, and prepare an ice bucket. Three 15 ml conical centrifuge tubes will be needed for each extracted sample. The three tubes will be used to hold the cell and protein pellet, the final extracted gDNA, and the isopropanol and ethanol supernatants.
Retrieve samples from storage, and invert samples several times then vortex at medium speed for 15 sec.
Dispense 2.5 ml of sample into a clean 15 ml centrifuge tube, and add 5 ml of Cell Lysis Solution. Mix the sample 50 times by inversion, and incubate at RT for 30 min.
After the RNase A incubation, increase the temperature of the water bath to 65 °C for the DNA rehydration step of the protocol.
Add 50 μl of Proteinase K Solution at 20 mg/ml, mix several times by inversion, and incubate at RT for a minimum of 30 min. Note: This is a possible pausing point for the protocol. After the addition of the Proteinase K Solution, the sample can be stored at 4 °C until the extraction can be completed. Storage at 4 °C for up to 24 hr was not shown to have a significant effect on the extraction yields or DNA quality. Long-term storage at this stage has not been evaluated.
Add 1.7 ml of Protein Precipitation Solution, vortex vigorously for 20 sec at high speed, and place on ice for 10 min.
Once the samples have cooled on ice for 10 min, centrifuge for 10 min at 3,000 x g and 4 °C. The precipitated proteins must form a tight pellet to continue. If the pellet is not tight or the solution is still cloudy, the samples can be cooled on ice for 5 min more and centrifugation repeated. The samples must be kept on ice to ensure a tight pellet.
Into a clean 15 ml centrifuge tube, pipet 5 ml of Isopropanol and 8 μl of pure Glycogen Solution at 20 mg/ml.
Pour the supernatant containing the gDNA from step 4.3 into the tube containing the Isopropanol and Glycogen Solution, leaving behind the precipitated protein pellet. Once the supernatant has been added, gently mix the sample 50 times by inversion and centrifuge for 30 min at 3,000 x g and 4 °C.
Pour the supernatant slowly into a clean 15 ml tube. After removal of the supernatant, add 1 ml of 70% ethanol to wash the pellet by slowly rocking and gently moving the ethanol over the precipitated pellet several times. Retain the ethanol in the tube.
After the initial wash, centrifuge the sample for 1 min at 2,000 x g and 20 °C. This centrifugation step can be done at either 4 °C or 20 °C. No significant effect of temperature has been shown for this step.
Following the initial wash and centrifugation of the pellet, slowly pour the ethanol wash from the tube and discard, then perform a second wash by repeating steps 5.3 and 5.4.
If the sample has not completely dried, air dry for another 15 min.Remove the samples from the water bath and incubate O/N at RT. NOTE: All products and reagents used are listed in the Materials Table, as well as Table 4.
To determine optimal parameters for DNA extraction a series of paired DNA extractions was performed. A single saliva sample was split and each portion tested with one of two possible values for a given variable. At least eight replicates of each paired test were performed (e.g., a single saliva sample was aliquoted to test extraction both with and without initial 50 °C incubation). Optimization was based on four standard metrics: total DNA yield, the 260/280 value, the 260/230 value, and visual inspection of electrophoresed DNA to assess fragmentation. Not all possible combinations of the variables were assessed statistical interactions (N=169 combinations), opting instead to assess the marginal effect of each variable individually. Effects were tested using a multi-way repeated-measures ANOVA and estimated effects were derived from the equivalent regression equation. All significant effects are summarized in Table 1, shown as average change in yield (ng/µl per ml of saliva input) or DNA quality (260/280 and 260/230).
Cell lysis (step 2) was optimized by assessing: 1) the presence/absence of a 50 °C incubation (1 hr) prior to cell lysis to ensure that Proteinase K degradation and cell lysis mediated by the storage buffer went to completion, 2) presence/absence of a homogenization by vortexing step (medium speed, 15 sec), and 3) lysis solution incubation time (5 versus 30 min). The 30 min cell lysis incubation increased yield by an average of 3.5% (p<.01) but no other cell lysis variable had a significant effect on yield. Vortexing decreased the 260/280 ratio by a statistically significant (p<.001) but practically small 0.03.
Protein precipitation (step 4) is preceded by Proteinase K digestion to disrupt amino acid chains, improving protein precipitation efficiency and releasing captured DNA. The amount of Proteinase K was varied ten-fold. Centrifugation temperature was reduced from 20 °C to 4 °C. Increasing the amount of Proteinase K caused a statistically significant decrease in yield (8.7%) and also slightly improved both the 260/280 and 260/230 ratios.
Ethanol precipitation (step 5) was the last stage of the protocol examined. The amount of glycogen carrier (0, 8 µl) was varied, as was the total centrifugation time (5 vs. 30 min 11 ). Only centrifugation time significantly affected yield, with an average increase of 290%. The longer spin also decreased the 260/280 ratio slightly (0.05). No significant effect of glycogen on yield was observed during the experiments; though the total quantity of DNA in these extractions was sufficiently large that glycogen would not typically be used. Despite the lack of effect in these samples, it is still recommend to use glycogen to minimize the risk of reduced yields whenever saliva input volume is lower than given here or if there is any other reason to believe yield will be low.
Visual inspection of the representative DNA samples (Figure 1) indicated that the extracted DNA was not greatly fragmented for any saliva DNA extraction procedure, but rather showed an appropriate high molecular weight band without the smearing indicative of degraded DNA. After RNase A digestion, the protocol produced an average 260/280 of 1.74.
Figure 1. Quality of saliva derived DNA. Four extraction procedures were applied to the same saliva collection. (A) Samples were electrophoresed on a 0.8% agarose gel (250 ng DNA). All variations of the saliva DNA extraction protocols result in high molecular weight (>20 kb) DNA, with no evidence of degradation. Lane: 1 DNA ladder, 2 & 3 Oragene prepIT L2P Protocol samples, 4 & 5 Gentra Puregene Body Fluids Protocol, 6 & 7 the optimized protocol without RNA removal step, 8 & 9 the optimized protocol with the RNA remove step. Lanes 2-7 are directly analogous protocols on the same saliva samples. Lanes 8 & 9 show that the RNA removal step does not introduce DNA degradation. (B) Samples were electrophoresed on a 2% agarose gel (150 ng DNA). A slight RNA peak is observable near the bottom of the gel in lanes 2 through 7 (conventions as above). Lanes 8 & 9 show the effectiveness of the RNA removal step.
The RNA Removal Step (step 3 with RNase A) is critical for accurate quantification of DNA. During testing, consistently high RNA content was observed, as determined by the ratio of double stranded DNA to RNA measured by a Qubit 2.0 Fluorometer. On average, nucleic acid content from samples without RNase A treatment consisted of 46.6% (±0.4) RNA. Samples that underwent the RNA Removal Step read as “
The DNA obtained through this optimized protocol was of sufficient quality for high throughput sequencing when the additional RNase A step was applied. To attain targeted resequencing data, a custom Agilent SureSelect Target Enrichment kit was applied to 24 samples, targeting 2.6 Mb of sequence. High throughput sequencing was conducted on 12 barcoded (indexed) samples per lane. Sequence reads were BWA-aligned to the hg19 reference genome 12 , then application of GATK 13 base quality score recalibration, indel realignment, duplicate removal, SNP discovery and genotyping simultaneously across all 24 samples was performed using the best practice hard filtering parameter values 14 . All 24 samples yielded high quality NGS data (Table 2). Of reads that passed Illumina’s standard filters and had Q>20, 91.4% aligned to the sequence enrichment target regions, providing an average on-target coverage depth of >30x coverage at Q>100, well within the necessary limits for rare SNP discovery in each sample. The average strand balance was 49.9%. Comparing variant calls with Illumina microarray genotypes yielded a concordance of 98.9%.
Candidate Variable | ng/µl | 260/280 | 260/230 |
Vortex | n.s. | -0.03*** | n.s. |
x30 min Cell Lysis Incubation | 3.5%** | n.s. | n.s. |
Proteinase K x10 | -8.7%* | -0.05*** | -0.03*** |
30 min spin | 290.2%*** | -0.05*** | n.s. |
Glycogen | n.s. | n.s. | -0.37*** |
Table 1. Effect Size of Optimized Variable on Quantity/Quality Metrics. All effect sizes are in the units listed in the column header. p-values from ANOVA: *p