Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

High-molecular weight DNA extraction, clean-up and size selection for long-read sequencing

  • Ashley Jones ,

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    ashley.jones@anu.edu.au

    Affiliation Research School of Biology, Australian National University, Canberra, Australian Capital Territory, Australia

  • Cynthia Torkel,

    Roles Data curation, Methodology, Project administration, Validation

    Affiliation Research School of Biology, Australian National University, Canberra, Australian Capital Territory, Australia

  • David Stanley,

    Roles Data curation, Methodology, Project administration, Validation

    Affiliations Research School of Biology, Australian National University, Canberra, Australian Capital Territory, Australia, Diversity Arrays Technology, Bruce, Australian Capital Territory, Australia

  • Jamila Nasim,

    Roles Data curation, Methodology, Project administration, Validation

    Affiliations Research School of Biology, Australian National University, Canberra, Australian Capital Territory, Australia, Soil Carbon Co., Orange, New South Wales, Australia

  • Justin Borevitz,

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Visualization

    Affiliation Research School of Biology, Australian National University, Canberra, Australian Capital Territory, Australia

  • Benjamin Schwessinger

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Validation, Visualization, Writing – review & editing

    Affiliation Research School of Biology, Australian National University, Canberra, Australian Capital Territory, Australia

Abstract

Rapid advancements in long-read sequencing technologies have transformed read lengths from bps to Mbps, which has enabled chromosome-scale genome assemblies. However, read lengths are now becoming limited by the extraction of pure high-molecular weight DNA suitable for long-read sequencing, which is particularly challenging in plants and fungi. To overcome this, we present a protocol collection; high-molecular weight DNA extraction, clean-up and size selection for long-read sequencing. We optimised a gentle magnetic bead based high-molecular weight DNA extraction, which is presented here in detail. The protocol circumvents spin columns and high-centrifugation, to limit DNA fragmentation. The protocol is scalable based on tissue input, which can be used on many species of plants, fungi, reptiles and bacteria. It is also cost effective compared to kit-based protocols and hence applicable at scale in low resource settings. An optional sorbitol wash is listed and is highly recommended for plant and fungal tissues. To further remove any remaining contaminants such as phenols and polysaccharides, optional DNA clean-up and size selection strategies are given. This protocol collection is suitable for all common long-read sequencing platforms, such as technologies offered by PacBio and Oxford Nanopore. Using these protocols, sequencing on the Oxford Nanopore MinION can achieve read length N50 values of 30–50 kb, with reads exceeding 200 kb and outputs ranging from 15–30 Gbp. This has been routinely achieved with various plant, fungi, animal and bacteria samples.

Introduction

DNA sequencing technologies have transformed genomics due to rapid advances in read length, throughput and application, combined with an ever competitive price. Short-read sequencing platforms provide billions of reads 100–250 bp in length at unrivalled accuracy, while long-read platforms a can provide millions of reads 1 kbp to 1 Mbp, at the cost of accuracy [1]. Long-read platforms have been at the forefront of recent advancements, as they offer unprecedented opportunities for de novo assembly of full length chromosomes and phasing of haplotypes [2,3]. With long-read sequencing platforms, advancements have shifted read length being limited by technology to being limited by quality and length of the DNA input. This has given rise to a new challenge; the extraction of pure high-molecular weight DNA suitable for long-read sequencing, which is particularly troublesome in plants and fungi. This is often caused by the presence of secondary metabolites such as polyphenols and polysaccharides. Polyphenols within the cytosol will be exposed to DNA after cell lysis and can have irreversible interactions [4]. Polysaccharides can co-precipitate with DNA in the presence of alcohol and can have downstream inhibitory effects in many molecular biology techniques [5]. Isolation of nuclei can help resolve these issues and obtain high-molecular weight DNA [6]. Indeed, nuclei preps have been further developed for long-read sequencing but remain laborious and low throughput [7]. One approach that is becoming widely utilized for long-read sequencing is the use of carboxylated magnetic beads, which DNA can bind to under the presence of polyethylene glycol and sodium chloride [8]. This method does not isolate nuclei but still circumvents the use of binding columns and high centrifugation, which are techniques that can fragment DNA. Here we present a modified protocol of Mayjonade et al. [8] that has been used across a wide variety genera of samples, including recalcitrant plants. For plants containing excessive phenols and polysaccharides, an optional washing of homogenate with sorbitol is included which help remove these contaminants [9]. Lastly, DNA clean-up and size selection options are presented which can greatly enhance the success of long-read sequencing platforms. This protocol is part of a bigger repository hosted on Protocols.io, as part of the public workspace ‘High-molecular weight DNA extraction from all kingdoms’ (https://www.protocols.io/workspaces/high-molecular-weight-dna-extraction-from-all-kingdoms).

Methods

The protocol described in this article is published on protocols.io, https://dx.doi.org/10.17504/protocols.io.bss7nehn.

Expected results

Using the protocol described, we have been obtaining large yields of high-molecular weight DNA (Table 1, Fig 1). DNA fragment size ranges from 20–200 kb in length, which is ideal for long-read sequencing (Fig 1). To remove the small DNA fragments and clean plant DNA preps which can be somewhat crude, PippinHT (Sage Science) to select fragments 20 kb and above has been very efficient (Table 1). Other DNA clean-up options are presented in the protocol and achieve similar results, however are more labour intensive. During sequencing, we can reproducibly obtain over 15–30 Gbp of reads from a single Oxford Nanopore MinION flow cell, with read length N50s 30–50 kb (Table 2, Fig 2). This includes quality reads over 200 kb in length (> Q7, Phred scale). It is likely smaller fragments are favoured during sequencing (higher molarity) and the library prep is likely to cause some DNA shearing. Sequencing with PacBio Sequel II (circular consensus sequencing mode for HiFi reads), yields over 20 Gbp can be achieved at very high accuracy (> Q30), but at a smaller length, as this technology is optimised for 10–20 kb fragments. High performing sequencing results have been achieved with various plant, fungi, animal and bacteria samples (Table 2). The sequencing data is being used for de novo genome assemblies and in some instances haplotype phasing. Sequencing data and the subsequent reference genomes being generated in this project are being made publicly available Sequence Read Archive (SRA, NCBI). Multiple Eucalyptus genomic datasets are available under BioProject PRJNA509734 and Acacia datasets are available under BioProject PRJNA510265. Supporting publications and other genera are soon to follow.

thumbnail
Fig 1. DNA size distribution of Eucalyptus caleyi based on a pulsed-field capillary electrophoresis system, a Femto Pulse (Agilent).

Peak at 200 kb represents all fragments > 200 kb, as they cannot be resolved with the technology. Sample was crude DNA prior to and size selection or further DNA clean-up.

https://doi.org/10.1371/journal.pone.0253830.g001

thumbnail
Fig 2. Read length by average read quality for Eucalyptus caleyi long-read sequencing with an Oxford Nanopore MinION flow cell.

Image generated with NanoPlot 1.28.2 [10].

https://doi.org/10.1371/journal.pone.0253830.g002

thumbnail
Table 1. Fluorometer and Spectrophotometer results of a DNA extraction for Eucalyptus caleyi.

Firstly, crude DNA was extracted, which was then size selected for 20 kb and above with a PippinHT (Sage Science).

https://doi.org/10.1371/journal.pone.0253830.t001

thumbnail
Table 2. Example long-read sequencing results for multiple samples using different sequencing platforms.

https://doi.org/10.1371/journal.pone.0253830.t002

Supporting information

S1 Protocol collection. Step-by-step protocol, also available on protocols.io.

https://doi.org/10.1371/journal.pone.0253830.s001

(PDF)

References

  1. 1. Logsdon GA, Vollger MR, Eichler EE. Long-read human genome sequencing and its applications. Nature Reviews Genetics. 2020;21(10):597–614. pmid:32504078
  2. 2. Miga KH, Koren S, Rhie A, Vollger MR, Gershman A, Bzikadze A, et al. Telomere-to-telomere assembly of a complete human X chromosome. Nature. 2020;585(7823):79–84. pmid:32663838
  3. 3. Shafin K, Pesout T, Lorig-Roach R, Haukness M, Olsen HE, Bosworth C, et al. Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes. Nature Biotechnology. 2020;38(9):1044–53. pmid:32686750
  4. 4. Leng M, Drocourt J-L, Helene C, Ramstein J. Interactions between phenol and nucleic acids. Biochimie. 1974;56(6):887–91. pmid:4447802
  5. 5. Do N, Adams RP. A simple technique for removing plant polysaccharide contaminants from DNA. BioTechniques. 1991;10(2):162, 4, 6. pmid:2059438
  6. 6. Bolger A, Scossa F, Bolger ME, Lanz C, Maumus F, Tohge T, et al. The genome of the stress-tolerant wild tomato species Solanum pennellii. Nature Genetics. 2014;46:1034. pmid:25064008
  7. 7. Workman R, Fedak R, Kilburn D, Hao S, Liu K, Timp W. High Molecular Weight DNA Extraction from Recalcitrant Plant Species for Third Generation Sequencing. 2018.
  8. 8. Mayjonade B, Gouzy J, Donnadieu C, Pouilly N, Marande W, Callot C, et al. Extraction of high-molecular-weight genomic DNA for long-read sequencing of single molecules. BioTechniques. 2016;61(4):203–5. pmid:27712583
  9. 9. Inglis PW, Pappas MdCR, Resende LV, Grattapaglia D. Fast and inexpensive protocols for consistent extraction of high quality DNA and RNA from challenging plant and fungal samples for high-throughput SNP genotyping and sequencing applications. PLOS ONE. 2018;13(10):e0206085. pmid:30335843
  10. 10. De Coster W, D’Hert S, Schultz DT, Cruts M, Van Broeckhoven C. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics. 2018;34(15):2666–9. pmid:29547981