Clover: a clustering-oriented de novo assembler for Illumina sequences
Contact: jmfhsieh@gmail.com
A simple example
After having installed Clover software, here is the test case.
1. Download the test data set
Download the testdata tar file here or download Data.allpathsCor.gz from GAGE Website here
(A dataset of four FASTQ files: frag_1.fastq, frag_2.fastq, shortjump_1.fastq, shortjump_2.fastq, containing 101 single-end and 2,722,066 paired-end reads from Rhodobacter sphaeroides strain, generated by an Illumina Genome Analyzer).
unpack it,
$ tar -zxvf testdata.tar.gz
and put these four FASTQ files in /clover-x.x
$ cp ./testdata/*.* ./clover-x.x
$ cd clover-x.x
2. Run Clover assembler
$ clover -k 46 -p 0 -i1 frag_1.fastq,shortjump_1.fastq -i2 frag_2.fastq,shortjump_2.fastq -cs 7 -ss 3 -is 180,3500 -hp 0.6 -pm -ml 200
3. Results
. It displays all the statistics data in the screen:
start:
Input reads, reads: 2722066
K-mers in reads: 152435625
K-mers in k-mer set: 4554148
Nodes to build the de Bruijn graph: 4554148
Finish the first step.
nodes: 4554148
paths: 2344
Trim low-frequency edges.
nodes: 4552253
paths: 1603
Prune tips, bubbles, and erroneous connections.
nodes: 4528817
paths: 832
Iteratively relink graph with shorter k-mer.
nodes: 4525151
paths: 487
Finish clean graph.
Contig set:
nodes: 4523157
total length: 4543542
contigs: 453
max length: 88519
average length 10029
n50: 20413
Run a scaffolding.
Scaffold set: 1
nodes: 4523157
total length: 4542417
contigs: 428
max length: 88519
average length 10613
n50: 21217
Run a scaffolding.
Scaffold set: 2
nodes: 4636413
total length: 4639068
contigs: 59
max length: 2482925
average length 78628
n50: 2482925
end:
. It also create 2 assembly output files in /clover-x.x here:
- On contig level: out_contig.fasta
Contig information: id, length, kmer coverage, whether it's tip and the sequence.
- On scaffold level: out_scaffold.fasta
Super contig information: id, length, kmer coverage, whether it's tip and the sequence.
Clover: a clustering-oriented de novo assembler for Illumina sequences
Contact: jmfhsieh@gmail.com