``High Performance Computational Tools for Motif Discovery''
N. E. Baldwin,
R. L. Collins,
M. A. Langston,
M. R. Leuze,
C. T. Symons, and
B. H. Voy
Abstract
The research described in this paper highlights a fruitful interplay
between biology and computation. The sequencing of complete genomes
from multiple organisms has revealed that most differences in organism
complexity are due to elements of gene regulation that reside in the
non protein coding portions of genes. Both within and between species,
transcription factor binding sites and the proteins that recognize
them govern the activity of cellular pathways that mediate adaptive
responses and survival. Experimental identification of these regulatory
elements is by nature a slow process. The availability of complete
genomic sequences, however, opens the door for computational methods to
predict binding sites and expedite our understanding of gene regulation
at a genomic level. Just as with traditional experimental approaches,
the computational identification of the molecular factors that control
a gene's expression level has been problematic. As a case in point,
the identification of putative motifs, which is the subject of this
paper, is a challenging combinatorial task. For it, powerful new motif
finding algorithms and high performance implementations are described.
Heavy use is made of graph algorithms, some of which are exceedingly
computationally intensive and involve the use of emergent mathematical
methods. An approach to fully dynamic load balancing is developed in
order to make effective use of highly parallel platforms.