The field of high throughput sequencing (HTS) is rapidly evolving, making it is almost never worth it for an university to buy a HTS device like illumina sequencers. While this might change in the future when prices further decline, sending your samples to third party sequencing services usually gives you higher flexibility and a better price. However, once you send your “ready to load” libraries away, the quality control and sequencing process is out of your control, and keeping your fingers crossed is all you can do. In this blog post I will discuss our first 8 DNA metabarcoding runs, on the Illumina HiSeq, NextSeq and MiSeq systems with 4 different sequencing facilities.


Figure 1:
Figure 1: Overview of number of sequences generated in each run relative to the expected amount of sequences according to Illumina specifications. Sequencing facilities are indicated by color in the timeline. Read number is for PE reads, 1 forward + 1 reverse read = 1 read. The individual projects (A-E) are discussed in detail below.

A) Testing biomass abundance relationships and primer bias

In mid 2014 we did put our first ready to load libraries onto the MiSeq system with 300 bp PE sequencing and chemistry v3. Sequencing of the mock communities was carried out by GATC biotech, and yielded ~8 million reads less than expected in the first run. After discussion with GATC, the amount of library loaded onto the sequencer was increased and the second run did yield the expected amount of sequences.

Overall GATC did do a good job with sequencing, and the sequencing quality was good as well. They quickly adjusted the amount of library used in the second run and I can recommend this sequencing service, even though it is a bit more pricey than other services. The results and sequencing data from this study are published in PlosONE.

B) Tesiting primer bias with a 16S marker

In the next project we tested the primer bias of an alternative to the COI barcoding marker; 16S. This run was carried out by our project partners Uwe John and Nancy Kühne at the AWI, wich have a MiSeq and NextSeq available at their institute. They did do an great job and things worked right out of the box without any issues. Also commercial sequencing services cant compete with the price, as we only had to pay for the sequencing kit. The comparison of COI and 16S primer bias will be published soon as a preprint, but first results are already available of twitter.

C) Primer testing and specimen sorting (HiSeq rapid run)

For the next test we have developed a set of 4 primer paris targeting freshwater invertebrates. Like in project A and B we used the mock communities of 52 freshwater specimens from Elbrecht & Leese 2015 to evaluate primer performance. We decided to use the HiSeq 2500 system as it gives 150 million reads per lane, with 250 bp PE sequencing.

Sequencing was done by Edward Wilcox from the DNA Sequencing Center (Brigham Young University, USA). The run was really cheap, as the university did offer free sequencing capacity over the website scienceexchange. Ed was very nice and when there was an issue with the sequencing run he did contact illumina and did obtain a new sequencing kit to rerun the samples. I can highly recommend science exchange as an alternative for commercial sequencing providers, even though  the runs were under clustered and did not yield the expected amount the sequencing quality was good.

Figure 2: In lane 2 air bubbles in 2 cycles caused "N" base called in almost all sequences. It is
Figure 2: In lane 2 air bubbles in 2 cycles caused “N” base called in almost all sequences. Abundance is only at ~15 / ~8 % as FastQC does summarizes data from 5 bp (see x axis), so the sequences which contain Ns at this position is <75%.

We are currently working on the analysis of the data, and the first results are available on twitter (primer test and sample sorting).

D) Testing on real world kick samples!

After doing a lot of primer testing and protocol verification we took our protocols into the real world and tested it on 20 sampling sites on a river called “Schmalenau” in west Germany. The specimens collected were first identified morphologically to evaluate the increase in taxonomic resolution with DNA metabarcoding, and also evaluate how many taxa are missed with our method.

The two libraries were sequences by our colleges at the AWI and and the other one by Macrogen. The sequences from the AWI were as expected but unfortunately Macrogen did not deliver the output expected and additionally forgot to add a 10% PhiX spike in. We are still investigating the PhiX problems with Macrogen, but the first impression is not great.

E) Testing the effect of BTI on Diptera

Together with colleges from the university Koblenz Landau we also used out DNA metabarcoding protocols to test the effect of a insecticide on dipteras. This run was also carried out by Macrogen, unfortunately with the same issues as described in D.

Discussion / Conclusions

Many sequencing runs did not generate the expected amount of sequences. However it is difficult to pin down the reason for that. It likely to be dependent on QC and library quantification the the sequencing facilities. Especially the projects D and E are interesting as our colleges at the AWI did generate the expected amount of sequences on the MiSeq system, while Macrogen only generated half of the expected amount of sequences despite the identical fusion primers and sequencing system.

I will contact all involved sequencing facilities again and ask about details and methods of library quantification and write a follow up post. We were lucky that so far every sequencing run did work out in the end and did yield usable data, even tough sometimes less than expected. We are sequencing with higher sequencing depth than needed for our samples, but longterm we have to figure out the cause of the reduced data yield. I keep you updated, once I get this black box cracked open!

PS: All results of these projects will be published in the next few month! I’m working hard on getting the analysis done and publications ready. Bioinformatics is the real bottle neck of HTS = ) However, if you are interested in details / results form a particular project feel free to contact me right away. I am happy to share the information I have and assist with your DNA metabarcoding projects!

The high throughput sequencing black box

Leave a Reply

Your email address will not be published. Required fields are marked *