pVAAST (the Pedigree Variant Annotation, Analysis, and Search Tool) is a software tool that searches whole-exome and whole-genome sequence data in families to identify genetic variants that directly influence disease risk. pVAAST analyzes the DNA sequences of patients, their relatives, and healthy people in a highly automated fashion to provide probabilistic predictions of the specific genetic variants and genes that are increasing the risk of developing disease. pVAAST combines the existing variant prioritization and case-control association features in VAAST with a new linkage analysis method specifically designed for sequence data. This model is broadly similar to traditional linkage analysis but is capable of modeling de novo mutations and is more sensitive in scenarios with incomplete penetrance or locus heterogeneity. pVAAST supports dominant, recessive, and de novo inheritance models, and maintains high power across a wide variety of study designs, from monogenic, Mendelian diseases in a single family to highly polygenic, common diseases involving hundreds of families.
Licensing and Download
The release of VAAST 2.1.0 includes pVAAST and is now available for download.Related Publications
Press Coverage
pVAAST 5-Minute Guide for the Impatient
- Call the variants in case and control genomes to create VCF files. Ideally, case and control samples should be matched in a) ethnicity; b) sequencing platform; and c) variant calling pipeline. For the best result, we also recommend jointly calling all case and control genomes with GATK UnifiedGenotyper. However, if no control genomes are available, publicly available exomes can be downloaded at: http://www.yandell-lab.org/software/VAAST/data/hg19/Background_CDR/
- Run <VAAST>/bin/vaast_tools/vcf2cdr.pl script to convert multi-sample VCF file(s) to CONDENSER (CDR) file format. (See the command line docs for more information.) This script will create one CDR file for each cohort, which can be unrelated cases/controls or families. An example of this step can be found at: <VAAST>/examples/vcf2cdr_example/vcf2cdr.sh
- Create the pedigree file (".ped" file; see http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#ped). Every family should have a separate pedigree file. For sequenced individuals, the IDs in the ".ped" file should match the IDs in the original VCF file (or the ## FILE-INDEX entries at the bottom of CDR files).
- Prepare the pVAAST parameter file. You can find several template parameter files in <VAAST>/data/pvaast/ folder, each designed for a different type of family and disease model. At a minimum, the options in the "Basic Options" section should be changed. Other sections are non-essential but can improve performance.
- Run pVAAST. The basic command line is: VAAST -m pvaast -pv_control <parameter file> <GFF3 annotation file> <Control CDR file> --gw <max permutations> For genome-wide significance, --gw value of at least 1e6 is recommended. An example bash script for this step can be found at: <VAAST>/examples/pvaast_example/pvaast.sh
-
Notes
- Any required external data files in this pipeline can be downloaded at: http://www.yandell-lab.org/software/VAAST/data/hg19/
- The ".simple" file provides a quick ranked list of protein coding genes. The ".vaast" file is the complete VAAST report.
- By default pVAAST scores only nonsynonymous and null mutations. To enable support for indels and splice sites, use --indel and --splice_site options in the pVAAST command line. CAUTION: indels and splice_site may result in significant inflation of the false-positive rate when cases and controls are not matched.
- For more information or for advanced options, please see the command line documentation, download VAAST documentation at http://www.yandell-lab.org/software/vaast.html, or read a preprint of our recent paper entitled “Identification of damaged genes and disease-causing alleles with VAAST.”
- IF YOU GET STUCK, WE WOULD LOVE TO HEAR FROM YOU AND HELP! Our mailing list is , and my email address is .