HADiT
Haplotype Amplification Distortion in Tumors
This page provides links to HADiT,the software written in Java that implements the Amplification Distortion Test (ADT). The following sections will guide you through downloading, building, and running HADiT.
The program has been developed by Itsik Pe'er's Lab of Computational Genetics at Columbia University. It is built in Java 1.5 and is tested in both the Windows and Linux environments. The source code is distributed here in a jar package under the GPL license.
Dependencies
HADiT has dependencies on the following publicly available libraries.
Please download the indicated versions in order to compile HADiT.
- Colt Math Library (version 1.2.0): http://acs.lbl.gov/~hoschek/colt/
- Commons Math Library (version 1.1): http://commons.apache.org/math/
- JFreeChart (version 1.0.6): http://www.jfree.org/jfreechart/
Running HADiT
There are several sample data files you will need to download first. These data files represent simulated data instead of real data. They represent the SNP and CNA information, as well as sample information, and the nucleotide map at each SNP marker. These files can be downloaded in this rar file. Unrar the files into a directory of your choice (usingWinRAR,
for example), which we will represent as $DATA
Running HADiT on this data signifies that you will be running the ADT on the data. The command for doing this is:
java –cp . Hadit –allmulti ascnprefix=$DATA\ascn.chr. ascnsuffix=.txt outputdir=$DATA\Results\ chromrange={1-22} snpmap=$DATA\Simulated.snpMap.txt samplefilter=$DATA\Simulated.uniqueSamples.txt cancermap=$DATA\Simulated.uniqueSamples.txt tasklist=$DATA\TaskList.AmplificationDistortion.txt
The output will reside in the $DATA\Results\ directory (make sure you create it first before running HADiT).
The most relevant output files will end in the .CountsSplit.txt extension, one file per chromosome. These files contain amplification distortion LOD scores for each allele or haplotype starting at each SNP. The columns are:
1. Sliding Window Number
2. Chromosome
3. Position Start
4. Position End
5. rsID (without the “rs” prefix)
6. Number of amplified alleles or haplotypes within that window
7. The allele or haplotype
8. Number of amplified instances of that allele or haplotype. If we are examining a single SNP (sliding window size of 1), this indicates the number of amplified instances of that allele within amplified heterozygous calls only.
9. Number of non-amplified instances of that allele or haplotype. If we are examining a single SNP (sliding window size of 1), this indicates the number of non-amplified instances of that allele within amplified heterozygous calls only.
10. The p-value of the binomial test for testing the number of amplified instances of the allele or haplotype
11. The p-value of the binomial test for testing the number of non-amplified instances of the allele or haplotype
12. A boolean indicator variable depicting whether column 10 is nominally significant (p ≤ 0.05) or not.
13. The LOD score, which is –log10(column 10)
Columns after these indicate information that can be ignored.
Thus, ADT returns LOD scores for every allele or haplotype. However, only a fraction of LOD scores are significant genome-wide. To calculate the genome-wide significance threshold, run the following command:
java –cp . Hadit –allmulti ascnprefix=$DATA\ascn.chr. ascnsuffix=.txt outputdir=$DATA\Results\Perm\ chromrange={1-22} snpmap=$DATA\Simulated.snpMap.txt samplefilter=$DATA\Simulated.uniqueSamples.txt cancermap=$DATA\Simulated.uniqueSamples.txt tasklist=$DATA\TaskList.PermutationTesting.txt
The output will reside in the $DATA\Results\Perm\ directory (make sure you create it first before running HADiT).
The relevant files are those starting with the prefix “Top_”.