Uncover repeat regions - from plants to humans

Xdrop™ enrichment lets you capture large fragments from complex genomes and genomic regions

Xdrop™ enrichment of repeat containing genomic regions is ideal for downstream long read sequencing.

Repetitive elements can be challenging to uncover by sequencing. Especially GC-rich repeats, sometimes termed “difficult-to-sequence” regions are difficult for several reasons; Taq DNA polymerases applied during enrichment and/or library preparation will often dissociate from the template DNA or create errors when passing through such regions. Consequently, low or no reads will be generated. In addition, repeat elements pose a challenge for correct bioinformatic mapping in short read sequencing.

Single molecule long read sequencing technologies are essential when long repetitive elements are to be sequenced.

The cost of whole genome long read sequencing and the high amount of data generated are barriers for analyzing and understanding repetitive regions and hence targeted sequencing is preferable. Until now, only very labour intensive solutions have been available making the process very expensive, laborious and time consuming. In addition, these solutions have only allowed reads across molecules a few kb in length.

The Xdrop™ platform is an automated droplet generation microfluidics technology that allows enrichment of up to 100 kb DNA molecules with the simple requirement of one standard PCR primer set.

To show how GC-rich repetitive regions easily and quickly can be captured, Samplix has applied Xdrop™ to enrich a 41 Kb genomic region of the Epstein Barr Virus (EBV) in a background of human DNA. The EBV genome includes a 1,500 bp repetitive region consisting of 12 repeats, each 125 bp in length. The repeat region has GC content of 76-91% and is known to be highly challenging to sequence due to the combination of high GC-content and the repetitive sequences.

In the current experiment EBV DNA was, with use of the Xdrop™ platform, enriched from a human DNA sample infected with EBV. Total DNA, from the EBV infected human samples, was subjected to Xdrop™ enrichment. Droplets containing the EBV target DNA was identified by amplification of a single 89 bp region on the EBV genome approximately 10 kb from the repeat region of interest (Figure A). The repeat region was positioned in the EBV-IR2 region, approximately 10 kb from the enrichment primer set. The enriched and amplified DNA from the Xdrop™ droplets was subjected to library preparation and sequenced with single molecule sequencing (SMRT system from Pacific Biosciences). The sequencing result showed excellent coverage across the entire 10 kb repetitive GC-rich region (in total, approximately 20 kb on each side of the primer site was captured and sequenced) as well as high coverage through positions 40,000 to 70,000 on the EBV genome.

Uncover repeat regions

Figure A. Enrichment of the challenging EBV genome by Xdrop™ and downstream targeted single molecule sequencing.

Top panel shows how good coverage was achieved across 41 kb of the EBV genome including a 1,500 bp repetitive region with high GC-content (76-91% GC) consisting of 12 repeats, each 125 bp in length (indicated with read vertical lines). The enrichment was conducted with a primer-set targeting 89 bp, around position 50 kb on the EBV genome (blue arrow) and repeats are positioned in the EBV-IR2 region, approximately 10 kb from the enrichment primers.

Lower panel shows the GC content of the sequence with the 1,500 bp GC-rich repeat region indicated between the red vertical lines.

Do you want to know more? 

Download the poster for more details on this project.

Poster 20170313rev2019small EBV