--^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^--
---------------- Run and understand VELVET ----------------
-v v v v v v v v v v v v v v v v v v v v v v v v v v v v v-

Program URL: https://github.com/dzerbino/velvet

The program “Velvet is used to to manipulate de Bruijn
graphs for genomic sequence assembly. A de Bruijn graph is a
compact representation based on short words (k-mers) that is
ideal for high coverage, very short read (25–50 bp) data
sets. Applying Velvet to very short reads and paired-ends
information only, one can produce contigs of significant
length, up to 50-kb N50 length in simulations of prokaryotic
data and 3-kb N50 on simulated mammalian BACs.

Errors are corrected after graph creation to allow for
simulta- neous operations over the whole set of reads. In
our framework, errors can be due to both the sequencing
process or to the bio- logical sample, for example,
polymorphisms. Distinguishing polymorphisms from errors is a
post-assembly task. A naive ap- proach to error removal
would be to use the difference between the expected coverage
of genuine sequences and that of random errors. Therefore
removing all the low coverage nodes (and their corresponding
arcs) would remove the errors. However, this relies on the
differences being due to genuine errors and not to bio-
logical variants present at a reasonable frequency in the
sample, and errors being randomly distributed in the reads.
Instead, Velvet focuses on topological features. Erroneous
data create three types of structures: “tips” due to errors
at the edges of reads, “bulges” due to internal read errors
or to nearby tips connecting, and erroneous connections due
to cloning errors or to distant merging tips. The three
features are removed con- secutively.

ORDER OF THINGS

	1. VelvetOptimizer.pl
	2. velveth
	3. velvetg

--^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^--
-----------------------------------------------------------
-v v v v v v v v v v v v v v v v v v v v v v v v v v v v v-

~~~~~~~~~~~~~~~~~~~~~~
~                    ~~
~ VelvetOptimizer.pl ~~
~                    ~~
~~~~~~~~~~~~~~~~~~~~~~

The Velvet software comes includes the script
VelvetOptimizer.pl, which uses a heuristic method to find
the optimal k-mer length and coverage cutoff for Velvet.

########################
# Example command line #
########################

$ VelvelOptimizer.pl -s 25 -e 45 -f' -shortPaired -fastq long_1.fastq long_2.fastq

-s and -e: indicate the k-mer minimum and maximum size

-f: set the insert size to auto

By default, VelvetOptimizer will choose the optimal k-mer
size based on the N50. However, the -k option enables users
to base the assembly optmization function on other variables.

~~~~~~~~~~~
~         ~~
~ velveth ~~
~         ~~
~~~~~~~~~~~

velveth stands for "Velvet hash". It reads the sequence
~~~~~~~~~~~

velveth stands for "Velvet hash". It reads the sequence
input files and outputs three files, Sequences, Roadmaps,
and Log.

velveth requires an output directory, the k-mer length (must
be an odd number), the sequence file format, the read type,
and the input filename(s).

########################
# Example command line #
########################

$ velveth velvet_output/ <k-mer size> -fastq -shortPaired long_1.fastq long_2.fastq

~~~~~~~~~~~
~         ~~
~ velvetg ~~
~         ~~
~~~~~~~~~~~

velvetg stands for "Velvet graph". It uses velveth outputs
to build the assembly and outputs the files contigs.fa,
UnusedReads.fa, Graph2, LastGraph, PreGraph, stats.txt, and
Log.

velvetg requires coverage cutoff to be specified in order to
exclude short, low-coverage nodes from the assembly. In
addition, running velvetg on paired-end reads requires the
expected insert length (the average length of the sequenced
fragment) and the expected kmer coverage.

########################
# Example command line #
########################

$ velvetg velvet_output/ -cov_cutoff auto