--^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^-- ----------------Run and to understand ABySS---------------- -v v v v v v v v v v v v v v v v v v v v v v v v v v v v v- ~~~~~~~~~ ~ ~~ ~ ABySS ~~ ~ ~~ ~~~~~~~~~ Program URL: https://github.com/dzerbino/velvet Article URL: http://genome.cshlp.org/content/19/6/1117.long ABSTRACT Widespread adoption of massively parallel deoxyribonucleic acid (DNA) sequencing instruments has prompted the recent development of de novo short read assembly algorithms. A common shortcoming of the available tools is their inability to efficiently assemble vast amounts of data generated from large-scale sequencing projects, such as the sequencing of individual human genomes to catalog natural genetic variation. To address this limitation, we developed ABySS (Assembly By Short Sequences), a parallelized sequence assembler. As a demonstration of the capability of our software, we assembled 3.5 billion paired-end reads from the genome of an African male publicly released by Illumina, Inc. Approximately 2.76 million contigs ≥100 base pairs (bp) in length were created with an N50 size of 1499 bp, representing 68% of the reference human genome. Analysis of these contigs identified polymorphic and novel sequences not present in the human reference assembly, which were validated by alignment to alternate human assemblies and to other primate genomes. NOTES ABySS allows for the computatin of the assembly in a parallel environment, which can be an advantage over Velvet when working with large genomes. An ABySS assembly can be obtained with only one command that requires the k-mer length (k), the input sequence file(s), and the name of the output file. For paired-end reads, the minimum number of pairs required to join two contigs (n) also needs to be specified. ############# # Run ABySS # ############# $ abyss-pe k=31 n=10 in='long_1.fastq long_2.fastq' name=replica01_k31_n10 This command will output four files: *-contigs.fa (contains contigs), *-bubbles.fa (variant sequences equal lengths), *-indel.fa (variant sequences with different lengths), *-contigs.dot (indicates which contigs overlap and by how much). ################ # Output stats # ################ ABySS will not automatically output a file with assembly statistics like Velvet's Log file. A stats file can be generated with abyss-fac command. $ abyss-fac replica01_k31_n10-contigs.fa > replica01_k31_n10-stats.txt abyss-fac computes these statistics for the contigs larger than 100 nucleotides. We achieved a similar situation with Velvet with the option "-min_contig_lgth 100".