Motivation: The choice of probes is an important feature of hybridisation experiments. In this paper we present an algorithm that optimises probes with respect to a training set of sequences based on Shannon entropy as a quality criterion. The practical motivation for our algorithm is oligonucleotide fingerprinting, a method for the simultaneous identification of sequences (cDNA or genomic DNA) by their hybridisation tags according to a set of short probes such as octamers, although the algorithm is of course not restricted to that application.
Results: We can show that our method is superior to the selection of probes according to their frequencies, which is a widely used strategy, and to randomly chosen probe sets. The quality of probe sets is assessed by a simulation pipeline that entails the set of probes as a simulation parameter. The performance of probe sets trained on sequences from different organisms shows additionally that probes should be chosen with regard to the organism under analysis. Case studies are presented on how constraints (G+C-content, complexity of the individual probes) influence the selection process.
Availability: A description of the oligonucleotide fingerprinting pipeline is published on our web-page http://www.molgen.mpg.de/ approximately ag_onf/met.htm. An executable of the algorithm and probe lists designed for human and rodents can be downloaded from the ftp-site ftp://ftp.molgen.mpg.de/pub/mpimg/probe_design/.