Unknown regions

Into the dark and camouflaged – overcoming null and ambiguous sequence mapping

Xdrop™ reveals the sequence of unknown regions by enriching long fragments of native DNA

  • Examine genomic regions without knowing their sequence in full, even with low input amounts

  • Resolve ambiguous alignment of reads from tandem repeats or structural variants

  • Uncover “blind spots” in genomes resulting from hard- to-amplify chemistry, like high GC content

Background

Genomes often contain unknown regions where standard short-read sequencing technologies fall short in assembling or aligning reads. Some regions are “dark by depth”. Sequencing results in few or no mappable reads, as is the case for stretches with high GC content.

Others are “camouflaged by mapping quality” where duplications, repeats, inversions and other structural variations lead to ambiguous mapping. High-fidelity, long-range sequence information with Xdrop™ can ultimately resolve this shortcoming.

The hard, the tricky, the unknown revealed.

To underscore the value of the Xdrop™ technology in elucidating unknown regions of a genome, we enriched a ~40 kb region of the Epstein-Barr virus (EBV) that includes a GC-rich tandem repeat (12 times 126 bp) that is 1.5 kb long. This specific region has very low coverage depth in existing genome maps because the chemistry makes amplification difficult and because short-read mapping is riddled with ambiguity. The Detection Sequence used to enrich the long fragments was located roughly 10 kb from the unknown region, in a portion of the genome with good sequence coverage.

EBV seq(1)

Image modified from Linz et al. J Virol. 2013;87(2):1172‐1182

High enrichment despite low viral copy numbers

The samples we used contained 2200-fold more concentrated background human DNA than EBV DNA. We partitioned all the DNA into double emulsion droplets – individual molecules encapsulated with enrichment primers to amplify the 89 bp Detection Sequence. In a process we call Indirect Sequence Capture, a clear fluorescent signal is emitted by the amplified Detection Sequence. We used this fluorescence to sort out only droplets containing the targeted EBV DNA fragments. Each resulting molecule was then amplified by droplet MDA.

EBV facs(2)

A GC-rich stretch of 12 repeats, each 125 bp long

The complete length of the 1.5 kb unknown region was contained in our enriched long DNA fragments. Thus, by maintaining the integrity of the sample DNA (i.e., not fragmenting the input DNA, enriching by Indirect Sequence Capture, and amplifying single target molecules), Xdrop™ maintained the integrity of the sequence information. Analysis on a PacBio RSII showed 12 repeats, each 125 bp long, with 76 to 91% GC content (red box in figure).

EBV cov

Do you want to know more? 

Characterize genomic regions obscured by the limits of short-read sequencing. Xdrop™ enriches long fragments of native DNA to maintain the integrity of long-range sequence information.