Clover: a clustering-oriented de novo assembler for Illumina sequences


EXECUTION ENVIRONMENTS

. Linux 64-bit
. Python 2.6 or later 


INSTALLATION

1. Download the Clover tar package from 
  <http://oz.nthu.edu.tw/~d9562563/>

2. Then unpack the downloaded package to any path.
  $ tar -zxvf clover-x.x.tar.gz
  $ cd clover-x.x


HOW TO USE IT

Clover is the command line tool. The user can run Clover straightforward with 
following parameters:

  $ clover -k <Length of k-mer> [options] -i1 <Input file1> [-i2 <Input file2>]

1. A simple Clover example: 

To assemble a paired read files, type:
  $ clover -k 40 -i1 frag1.fq -i2 frag2.fq

It produces the following files:
out_contig.fasta - contig prediction using reads information.
out_scaffold.fasta - super contig prediction after scaffolding on mate-pair information.
systemfile... - intermediate files that could be removed after execution. If the intermediate files are reserved, Clover could run faster from the intermediate result when rerun on the same inputs and -k and -p.

First of all, we must define a parameter type, list: 
A list is a series of parameters more than two, which separated by ','. For example, a list of 3 integers: 3,5,7 and a list of two filenames: frag1.fastq,short1.fastq.

  -k    [integer] (default 40)
        Length of k-mer
  -i1   [filename or list of filenames] 
        Input file1
  -i2   [filename or list of filenames] 
        Input file2

If paired read files are used, file name of -i2 must correspond to -i1.
For example, if two libraries of paired read files frag1.fq, frag2.fq, short1.fq and short2.fq 
are used, where assume that frag1.fq corresponds to frag2.fq and short1.fq 
corresponds to short2.fq:
  $ clover -k 40 -i1 frag1.fq,short1.fq -i2 frag2.fq,short2.fq

If only a read file is used without mate pair, the parameter -i2 can be omitted.
For example, if one library of single read file frag.fastq is used:
  $ clover -k 40 -i1 frag.fastq

The file formats accepted by Clover are 'fasta' and 'fastq', which can be 
distinguished by their filename extensions (.fa, .fasta, .fq, .fastq, .fatq).

2. Important and frequently used options:

  -p    [integer] (default 1 and constrain to p < k)
        Edit distance when clustering k-mers 
  -o    [filename] (default out)
        the prefix of the Output file 
  -is   [integer or list of integers] 
        Insert sizes of fragment libraries, and the order must 
        correspond to the input files. 
        For example, if two libraries of paired read files with insert 
        size 180 and 3500 are used: frag1.fq, frag2.fq, short1.fq and 
        short2.fq.
          $ clover -k 40 -p 1 -is 180,3500 -i1 frag1.fq,short1.fq -i2 
            frag2.fq,short2.fq
        If we omit -is, Clover would automatic estimate the insert size.
        For example, if two libraries of paired read files with unknown 
        insert size are used: frag1.fq, frag2.fq, short1.fq and 
        short2.fq.
          $ clover -k 40 -p 1 -i1 frag1.fq,short1.fq -i2 
            frag2.fq,short2.fq
  -ss   [integer or list of integers] (default 5)
        Sufficient support for scaffolding If there are multiple 
        libraries, set it to a list of numbers and the order must 
        correspond to the input files.
        If there are multiple libraries with setting of an integer,   
        Clover may apply the same integer on all libraries.
  -cs   [integer] (default 5)
        Sufficient support for contig linking by a shorter k-mer 

3. Advanced options:

  -m    [integer] (default (k/2)+1 and constrain to m < k)
        Minimum length of k-mer for contig linking by a shorter k-mer 
  -ml   [integer] (default 200)
        Minimum length of contig before outputting 
  -sp   [fraction] (default 0.3 and constrain to 0<= sp <= 1)
        Split coefficient to split node when containing several major consensus sequences
  -hp   [fraction] (default 0.8 and constrain to 0<= hp <= 1)
        Homogeneous coefficient of distribution of input reads 
        If consider the input reads are pretty homogeneous, may set it 
        to 1.0, if consider input reads are pretty heterogeneous, may 
        set it to 0.6 or less.
  -rp   [fraction] (default 0.0 and constrain to 0<= rp <= 1)
        Repeating coefficient 
        If set it to 0.0, Clover would not execute the process to 
        resolve repeat. 
        If set it greater than 0, Clover would resolve repeats acording 
        to the condition given by rp and we usually set it to 0.8 if 
        needed. 
        Like hp, rp relates to the homogeneous situation in the repeat 
        region. A higher rp gives a tighter condition to resolve repeat, 
        a lower rp gives a looser condition to resolve repeat that would 
        produce more errors.
  -ie   [fraction] (default 0.01333333 and constrain to 0<= ie <= 1)
        Background probability of sequencing error to a certain nucleic 
        acid.
        If consider the input reads are very accurate, may set it to  
        0.0.

4. Flag arguments:

  -t    [ ] 
        Executes pruning of tips and erroneous connections.
  -f    [ ] 
        Gives an earlier execution of scaffolding on fragment read set 
        before cleaning of contig with length less than ml.
  -pm   [ ] 
        Gives an earlier execution of contig linking by a shorter k-mer 
        before trim low-frequency edges.
  -pr   [ ] 
        Gives an earlier execution of resolving repeats if rp greater 
        than 0.0 before cleaning of contig with length less than ml.

For example, to use the -f flag:
  $ clover -k 40 -p 1 -i1 frag1.fq,short1.fq -i2 frag2.fq,short2.fq -ss 5 -cs 7 -f


SEE ALSO

For more information, please browse 'Test Case' of our Website.


CONTACT INFORMATION

We would like to hear your comments and suggestions. Please browse our Website or Email to us. 
  jmfhsieh@gmail.com 

