The programs on this page may be freely used, modified, and shared under the GNU General Public License version 3.0 (GPL-3.0).
The YBYRÁ project integrates software solutions for data analysis in phylogenetics. It comprises tools for (1) topological distance calculation based on the number of shared splits or clades, (2) sensitivity analysis and automatic generation of sensitivity plots (“Navajo rugs”) and (3) clade diagnoses based on different categories of synapomorphies (using TNT). YBYRÁ also provides (4) a framework to facilitate the search for potential rogue taxa based on how much they affect average matching split distances (using MSdist).
Citation: Machado D.J. 2015. YBYRÁ facilitates comparison of large phylogenetic trees. BMC Bioinformatics 16:204. doi:10.1186/s12859-015-0642-9
Financial support: FAPESP Proc. No. 2009/13561-5, 2012/10000-5 and 2013/05958-8.
The PAckage of TOols with Fast(A/Q) Utilities (PATO-FU) is a collection of homemade Python scripts take can be useful for dealing with raw sequencing data in FASTA or FASTQ format. The programs can also accept gunzip'ed files as input.
Financial support: FAPESP Proc. No. 2013/05958-8.
Download
Quick reference
Gitlab
Manual
As stated by Yang et al. (2013: 14), “to get reliable result in downstream analysis, it is necessary to remove low quality reads avoiding mismatches in read mapping, and false paths during genome assembly”. Due to its function versatility or run-time efficiency, we have selected the HTQC toolkit (Yang et al. 2013) to perform reads quality assessment and filtration. The complete quality control protocol is described bellow.
The program “selectTiles.py” automatizes the selection of tiles to be removed after running "ht-stat", following criteria based in the HTQC guidelines:
(1) more than 50% of the reads have quality score bellow 10;
(2) less than 10% of the reads have quality greater than 30;
(3) most reads have quality bellow 20.
Users can change these criteria at will or even employ additional conditions, as it is explained at the beginning of the program. Whenever necessary, selected tiles were removed with “ht-filter”.
CAF is a text format for describing sequence assemblies. It is acedb-compliant and is an extension of the ace-file format used earlier, but with support for base quality measures and a more extensive description of the Sequence data. The program parseCaf.py parses padded CAF sequence files for DNA data. CAF files must have only one contig per file. Execute "python parseCaf.py --help" to see all arguments available.
sudoParllelGarli.py is a Python script that can be used to run multiple Garli processes using an executable compiled without the MPI option.
The programs in this tool kit allows the user to search form reads in a large sequence file (FASTA or FASTQ) that aligns against a particular local database. Alignments are parallelized with Python's multiprocessing.
Financial support: FAPESP Proc. 2012/10000-5 and 2013/05958-8.
The redux program can paralellize the search for duplicates sequences in FASTA or FASTQ formated files.
Financial support: FAPESP Proc. 2012/10000-5 and 2013/05958-8.
This zip file contains a variety of Python scripts for multiple purposes Usage information os provided as comments in the beginning of each file.
Departamento de Zoologia, IBUSP.