Introduction


Next-generation sequencing technologies produce high coverage of short reads. The de Bruijn graph approach is prevalent in current de novo assemblers, and it constitutes all possible substrings of length k (termed k-mers) from the short reads to efficiently process the huge sequencing data.

Smaller k-mers increase the connectivity of the graph and larger k-mers decrease the number of ambiguous repeats in the graph. There is therefore a balance between sensitivity and specificity determined by k. However, larger k-mers decrease the sensitivity further due to sequencing error.

In this study, we develop a de Bruijn-based assembler, called Clover (clustering-orient de novo assembler), which utilizes a novel k-mer clustering approach from the overlap-based concept to deal with the sequencing errors generated by Illumina platform. Furthermore, Clover iteratively reconstructs the de Bruijn graph with shorter k-mers to increase the sensitivity of large k-mers.

Clover proceeds through the following phases:
• Construction and clustering of k-mers.
• De Bruijn graph construction.
• Graph cleaning and reconstruction with shorter k-mers.
• Scaffolding.

 

 


 

 


Clover: a clustering-oriented de novo assembler for Illumina sequences
Contact: jmfhsieh@gmail.com