GermLineUsage

Usage

From the command line, extract germline with tar xzvf germline-X-X-X.zip, enter the extracted directory, and compile germline with make all. A simple test-case using shortened HapMap samples can be run using make test. The executable is run as germline <options> which prompts the user for input/output file information and runs the algorithm.

Input
GERMLINE accepts as input the following formats:

[ doc ] Plink / ped+map
[ doc ] PHASE / HapMap

NOTE: Although the PLINK format is not intended for haplotypes, GERMLINE expects the respective alleles to appear in
order; i.e. the first allele always corresponds to one haplotype and the second allele to the other. Also, PLINK arbitrarily re-orders the
alleles in processing the files, so we do not recommend handling phased data with PLINK prior to GERMLINE analysis because the haplotypes
may not be intact (use the -from_snp and -to_snp flags to target specific regions).

Output

Upon completion, GERMLINE generates a .match and .log file in the specified location. Each line in the .match file corresponds to a pairwise shared segment, with the following fields:

Family ID 1
Individual ID 1
Family ID 2
Individual ID 2
Chromosome
Segment start (bp)
Segment end (bp)
Segment start (SNP)
Segment end (SNP)
Total SNPs in segment
Genetic length of segment
Units for genetic length (cM or MB)
Mismatching SNPs in segment
1 if Individual 1 is homozygous in match; 0 otherwise
1 if Individual 2 is homozygous in match; 0 otherwise

Binary Output

To spave space GERMLINE can also generate binary output using the -bin_out flag. This flag will generate three files:

*.bsid Two columns per line for each sample: FAM ID,SAMPLE ID.
*.bmid Four columns per line for each marker: CHROMOSOME,RSID,GENETIC DISTANCE,PHYSICAL DISTANCE.
*.bmatch Binary match file containing integer pointers to samples (from bsid file), markers (from bmid file) and boolean meta-data.

The binary files can be converted back to the standard flat format described above by using the parse_bmatch utility provided with the code. Load the three generated files using parse_bmatch [BMATCH FILE] [BSID FILE] [BMID FILE] and the flat match output will be printed to standard out. See the parse_bmatch.cpp code for binary format details.

Options

The program has several command line options to direct the segmental sharing process:

FlagDefaultDescription
-map-File location for genetic distance map. Uses the PLINK map format.

-min_m3Minimum length for match to be used for imputation (in cM or MB).
-err_hom2The maximum number of mismatching homozygous markers for a slice to still be considered part of a match.
-err_het0The maximum number of mismatching heterozygous markers for
a slice to
still be considered part of a match.

-from_snp-Indicate the ID of the first SNP to start processing from.
-to_snp-Indicate the ID of the last SNP to end processing with.
-h_extend-Extends from exact seeds using haplotypes rather than genotypes; useful when
data is well-phased (e.g. trios)
-homoz-Allow self matches (test for homozygosity)

-homoz-only-Analyze and report only auto/homo-zygous segments, no IBD reported but significantly faster analysis.
-haploid-Treat each input individual as two distinct and separate haplotypes. Output IDs will have .0/.1 suffix corresponding to each haplotype. The -err_het flag will have no effect in this analysis.
-bin_out-Generate output matches in binary format, creates a *.bmatch *.bsid and *.bmid files. These files can be converted to flat output using the parse_bmatch utility included and compiled in the package.
-bits128Size of each slice (in markers) used for exact matching seeds.

GermLineUsage

Syndicate