Into the dark and camouflaged – overcoming null and ambiguous sequence mapping
Genomes often contain unknown regions where standard short-read sequencing technologies fall short in assembling or aligning reads. Some regions are “dark by depth”. Sequencing results in few or no mappable reads, as is the case for stretches with high GC content.
Others are “camouflaged by mapping quality” where duplications, repeats, inversions and other structural variations lead to ambiguous mapping. High-fidelity, long-range sequence information with Xdrop™ can ultimately resolve this shortcoming.
To underscore the value of the Xdrop™ technology in elucidating unknown regions of a genome, we enriched a ~40 kb region of the Epstein-Barr virus (EBV) that includes a GC-rich tandem repeat (12 times 126 bp) that is 1.5 kb long. This specific region has very low coverage depth in existing genome maps because the chemistry makes amplification difficult and because short-read mapping is riddled with ambiguity. The Detection Sequence used to enrich the long fragments was located roughly 10 kb from the unknown region, in a portion of the genome with good sequence coverage.
Image modified from Linz et al. J Virol. 2013;87(2):1172‐1182
The samples we used contained 2200-fold more concentrated background human DNA than EBV DNA. We partitioned all the DNA into double emulsion droplets – individual molecules encapsulated with enrichment primers to amplify the 89 bp Detection Sequence. In a process we call Indirect Sequence Capture, a clear fluorescent signal is emitted by the amplified Detection Sequence. We used this fluorescence to sort out only droplets containing the targeted EBV DNA fragments. Each resulting molecule was then amplified by droplet MDA.
The complete length of the 1.5 kb unknown region was contained in our enriched long DNA fragments. Thus, by maintaining the integrity of the sample DNA (i.e., not fragmenting the input DNA, enriching by Indirect Sequence Capture, and amplifying single target molecules), Xdrop™ maintained the integrity of the sequence information. Analysis on a PacBio RSII showed 12 repeats, each 125 bp long, with 76 to 91% GC content (red box in figure).
ContactMileparken 282730 HerlevDenmark
firstname.lastname@example.orgPhone: (+45) 82 30 45 00
CVR 32 30 93 21
Connect with us
2019 Samplix ApS. All rights reserved. For Research Use Only. Not for use in diagnostic procedures.