PacBio - Whole Genome Sequencing
With Whole Genome SMRT sequencing, the high-molecular DNA is initially sheared uniformly and used to prepare the libraries without an amplification step. An optional size-selection step allows shorter fragments to be removed before sequencing.
Depending on the question of the project, two different types of reads can be generated during sequencing. The first variant is to generate reads that are as long as possible, whereby the individual DNA fragments are only read once (CLR - Continuous Long Reads). This method is particularly suitable for detecting structural variants. The error rate is ~ 15 %. One Sequel SMRT Cell (8M) in combination with the CLR workflow should generate a minimum of 100 Gb of CLR output data.
The second option is to read individual DNA fragments with an average size of 15-20 kb several times. A highly accurate consensus sequence can then be formed from the individual subreads, so-called HiFi (high fidelity) or CCS (circular consensus sequence) reads. With this method, the eror rate is reduced to ~0.1 %. This method is particularly suitable for the detection of SNVs (Single Nucleotide Variants) or for de novo genome assemblies, for which no additional short read data is then required. One Sequel SMRT Cell (8M) in combination with the CCS workflow should enable a HiFi data output of approx. 15 Gb.
A Sequel SMRT Cell delivers approx. 15 Gb output. The Sequel II has an approximately 10 times higher output than the previous model. Due to this higher output it is now also possible to use HiFi reads for the detection of SNVs and InDels or to generate de novo genome assemblies from 15-20kb HiFi reads. The number of SMRT cells required - and thus a factor for calculating the sequencing costs - varies depending on the genome size of the sequenced organism and the desired coverage.
- Gap Filling: The PacBio Assembly is only used to bridge gaps or ambiguities in an already existing short-read scaffold genome
- Structural Variants (SV) Detection: Here, the de novo assembled PacBio genome is mapped against a known reference genome to detect large structural differences (e.g., insertions / deletions / inversions)
- Hybrid assembly: Here, long and short read data are combined in one assembly in order to minimize the errors of both approaches
- De novo assembly: A completely new, high-quality genome of the organism is created, ideally with one sequence per chromosome / plasmid
Small genomes (e.g. Bacteria) can be multiplexed. Samples with large genomes may have to be sequenced in parallel on several SMRT cells in order to obtain a sufficiently high coverage.