How to use it

Clover is the command line tool. The user can run Clover straightforward with following parameters:

   $ clover -k <Length of k-mer> [options] -i1 <Input file1> [-i2 <Input file2>]

1. A simple Clover example:

To assemble a paired read files, type:
   $ clover -k 40 -i1 frag1.fq -i2 frag2.fq

It produces the following files:
   out_contig.fasta - contig prediction using reads information.
   out_scaffold.fasta - super contig prediction after scaffolding on mate-pair information.
   systemfile... - intermediate files that could be removed after execution. If the intermediate files are reserved, Clover could run faster from the intermediate result when rerun on the same inputs and -k and -p.

First of all, we must define a parameter type, list:
A list is a series of parameters more than two, which separated by ‘,’. For example, a list of 3 integers: 3,5,7 and a list of two filenames: frag1.fastq,short1.fastq.

   -k    [integer] (default 40)
           Length of k-mer
   -i1  [filename or list of filenames]
          Input file1
   -i2  [filename or list of filenames]
          Input file2

If paired read files are used, file name of -i2 must correspond to -i1.
For example, if two libraries of paired read files frag1.fq, frag2.fq, short1.fq and short2.fq
are used, where assume that frag1.fq corresponds to frag2.fq and short1.fq corresponds to short2.fq:
   $ clover -k 40 -i1 frag1.fq,short1.fq -i2 frag2.fq,short2.fq

If only a read file is used without mate pair, the parameter -i2 can be omit.
For example, if one library of single read file frag.fastq is used:
   $ clover -k 40 -i1 frag.fastq

The file formats accepted by Clover are ‘fasta’ and ‘fastq’, which can be distinguished by their filename extensions (.fa, .fasta, .fq, .fastq, .fatq).

2. Important and frequently used options:

   -p    [integer] (default 1 and constrain to p < k)
           Edit distance when clustering k-mers
   -o    [filename] (default out)
           the prefix of the Output file
   -is   [integer or list of integers]
           Insert sizes of fragment libraries, and the order must correspond to the input files.
          For example, if two libraries of paired read files with insertsize 180 and 3500 are
          used: frag1.fq, frag2.fq, short1.fq and short2.fq.
             $ clover -k 40 -p 1 -is 180,3500 -i1 frag1.fq,short1.fq -i2 frag2.fq,short2.fq
          If we omit -is, Clover would automatic estimate the insert size.
          For example, if two libraries of paired read files with unknown insert size are used:
          frag1.fq, frag2.fq, short1.fq and short2.fq.
            $ clover -k 40 -p 1 -i1 frag1.fq,short1.fq -i2 frag2.fq,short2.fq
   -ss   [integer or list of integers] (default 5)
          Sufficient support for scaffolding If there are multiple libraries, set it to a list of
          numbers and the order must correspond to the input files.
          If there are multiple libraries with setting of an integer, Clover may apply the same
          integer on all libraries.
   -cs   [integer] (default 5)
          Sufficient support for contig linking by a shorter k-mer

3. Advanced options:

   -m    [integer] (default (k/2)+1 and constrain to m < k)
           Minimum length of k-mer for contig linking by a shorter k-mer
   -ml   [integer] (default 200)
           Minimum length of contig before outputting
   -sp   [fraction] (default 0.3 and constrain to 0<= hp <= 1)
           Split coefficient to split node when containing several major consensus sequences
   -hp   [fraction] (default 0.8 and constrain to 0<= hp <= 1)
           Homogeneous coefficient of distribution of input reads
           If consider the input reads are pretty homogeneous, may set it to 1.0, if consider
           input reads are pretty heterogeneous, may set it to 0.6 or less.
   -rp   [fraction] (default 0.0 and constrain to 0<= rp <= 1)
           Repeating coefficient
           If set it to 0.0, Clover would not execute the process to resolve repeat.
           If set it greater than 0, Clover would resolve repeats according to the condition
           given by rp and we usually set it to 0.8 if needed.
           Like hp, rp relates to the homogeneous situation in the repeat region. A higher rp
           gives a tighter condition to resolve repeat, a lower rp gives a looser condition to
           resolve repeat that would produce more errors.
   -ie   [fraction] (default 0.01333333 and constrain to 0 <= ie <= 1)
           Background probability of sequencing error to a certain nucleic acid.
           If consider the input reads are very accurate, may set it to 0.0.

4. Flag arguments:

    -t    [ ]
           Executes pruning of tips and erroneous connections.
    -f    [ ]
           Gives an earlier execution of scaffolding on fragment read set before cleaning of
           contig with length less than ml.
   -pm   [ ]
           Gives an earlier execution of contig linking by a shorter k-mer before trim
           low-frequency edges.
   -pr   [ ]
          Gives an earlier execution of resolving repeats if rp greater than 0.0 before cleaning
          of contig with length less than ml.

For example, to use the -f flag:
   $ clover -k 40 -p 1 -i1 frag1.fq,short1.fq -i2 frag2.fq,short2.fq -ss 5 -cs 7 -f

See also

For more information, please browse ‘Test Case’ of our Website.

Contact information

We would like to hear your comments and suggestions. Please browse our Website or Email to us.

 

 

 



Clover: a clustering-oriented de novo assembler for Illumina sequences
Contact: jmfhsieh@gmail.com