Clover: a clustering-oriented de novo assembler for Illumina sequences
Contact: jmfhsieh@gmail.com
How to use it
Clover is the command line tool. The user can run Clover straightforward with following parameters:
$ clover -k <Length of k-mer> [options] -i1 <Input file1> [-i2 <Input file2>]
1. A simple Clover example:
To assemble a paired read files, type:
$ clover -k 40 -i1 frag1.fq -i2 frag2.fq
It produces the following files:
out_contig.fasta - contig prediction using reads information.
out_scaffold.fasta - super contig prediction after scaffolding on mate-pair information.
systemfile... - intermediate files that could be removed after execution. If the intermediate files are reserved, Clover could run faster from the intermediate result when rerun on the same inputs and -k and -p.
First of all, we must define a parameter type, list:
A list is a series of parameters more than two, which separated by ‘,’. For example, a list of 3 integers: 3,5,7 and a list of two filenames: frag1.fastq,short1.fastq.
-k [integer] (default 40)
Length of k-mer
-i1 [filename or list of filenames]
Input file1
-i2 [filename or list of filenames]
Input file2
If paired read files are used, file name of -i2 must correspond to -i1.
For example, if two libraries of paired read files frag1.fq, frag2.fq, short1.fq and short2.fq
are used, where assume that frag1.fq corresponds to frag2.fq and short1.fq
corresponds to short2.fq:
$ clover -k 40 -i1 frag1.fq,short1.fq -i2 frag2.fq,short2.fq
If only a read file is used without mate pair, the parameter -i2 can be omit.
For example, if one library of single read file frag.fastq is used:
$ clover -k 40 -i1 frag.fastq
The file formats accepted by Clover are ‘fasta’ and ‘fastq’, which can be distinguished by their filename extensions (.fa, .fasta, .fq, .fastq, .fatq).
2. Important and frequently used options:
-p [integer] (default 1 and constrain to p < k)
Edit distance when clustering k-mers
-o [filename] (default out)
the prefix of the Output file
-is [integer or list of integers]
Insert sizes of fragment libraries, and the order must correspond to the input files.
For example, if two libraries of paired read files with insertsize 180 and 3500 are
used: frag1.fq, frag2.fq, short1.fq and short2.fq.
$ clover -k 40 -p 1 -is 180,3500 -i1 frag1.fq,short1.fq -i2 frag2.fq,short2.fq
If we omit -is, Clover would automatic estimate the insert size.
For example, if two libraries of paired read files with unknown insert size are used:
frag1.fq, frag2.fq, short1.fq and short2.fq.
$ clover -k 40 -p 1 -i1 frag1.fq,short1.fq -i2 frag2.fq,short2.fq
-ss [integer or list of integers] (default 5)
Sufficient support for scaffolding If there are multiple libraries, set it to a list of
numbers and the order must correspond to the input files.
If there are multiple libraries with setting of an integer, Clover may apply the same
integer on all libraries.
-cs [integer] (default 5)
Sufficient support for contig linking by a shorter k-mer
3. Advanced options:
-m [integer] (default (k/2)+1 and constrain to m < k)
Minimum length of k-mer for contig linking by a shorter k-mer
-ml [integer] (default 200)
Minimum length of contig before outputting
-sp [fraction] (default 0.3 and constrain to 0<= hp <= 1)
Split coefficient to split node when containing several major consensus sequences
-hp [fraction] (default 0.8 and constrain to 0<= hp <= 1)
Homogeneous coefficient of distribution of input reads
If consider the input reads are pretty homogeneous, may set it
to 1.0, if consider
input reads are pretty heterogeneous, may
set it to 0.6 or less.
-rp [fraction] (default 0.0 and constrain to 0<= rp <= 1)
Repeating coefficient
If set it to 0.0, Clover would not execute the process to
resolve repeat.
If set it greater than 0, Clover would resolve repeats according
to the condition
given by rp and we usually set it to 0.8 if
needed.
Like hp, rp relates to the homogeneous situation in the repeat
region. A higher rp
gives a tighter condition to resolve repeat,
a lower rp gives a looser condition to
resolve repeat that would
produce more errors.
-ie [fraction] (default 0.01333333 and constrain to 0 <= ie <= 1)
Background probability of sequencing error to a certain nucleic
acid.
If consider the input reads are very accurate, may set it to
0.0.
4. Flag arguments:
-t [ ]
Executes pruning of tips and erroneous connections.
-f [ ]
Gives an earlier execution of scaffolding on fragment read set
before cleaning of
contig with length less than ml.
-pm [ ]
Gives an earlier execution of contig linking by a shorter k-mer
before trim
low-frequency edges.
-pr [ ]
Gives an earlier execution of resolving repeats if rp greater
than 0.0 before cleaning
of contig with length less than ml.
For example, to use the -f flag:
$ clover -k 40 -p 1 -i1 frag1.fq,short1.fq -i2 frag2.fq,short2.fq -ss
5 -cs 7 -f
See also
For more information, please browse ‘Test Case’ of our Website.
Contact information
We would like to hear your comments and suggestions. Please browse our Website or Email to us.
Clover: a clustering-oriented de novo assembler for Illumina sequences
Contact: jmfhsieh@gmail.com