Room P3.10, Mathematics Building

A parallel algorithm for the extraction of structured motifs

We present a parallel algorithm for the efficient extraction of binding-site consensus from genomic sequences. This algorithm is based on an existing approach for extracting structured motifs. A structured motif consists of an ordered collection of $p$ boxes, $p$ substitution rates and $p-1$ distances between successive boxes. The contents of the boxes, which represent the extracted motifs, are unknown at the start of the process and are found by the algorithm using a suffix tree as the fundamental data structure. By partitioning the structured motif searching space we divide the most demanding part of the algorithm by a number of processors that can be loosely coupled. In this way we obtain, under conditions that are easily met, a speedup that is linear on the number of available processing units. This speedup is verified by both theoretical and experimental analysis.