Estimating Protein Function From Combinatorial Sequence Data Using Decision
Algorithms and Neural Networks
Goldman, E.R., Füllen, and Youvan, D.C.
Palo Alto Institute of Molecular Medicine, Mountain View, CA 94043, USA.
FSPM-Strukturbildungsprozesse,
Universität Bielefeld, Bielefeld, Germany
Correlations between protein sequences and phenotypes were explored using databases of
combinatorial cassette mutants of pigment-protein complexes. Heuristically
formulated decision algorithms and computer implemented neural networks were compared to
determine their accuracy in classification of phenotypic categories. For the
databases examined, decision algorithms employing very simple rules were able to properly
classify mutants 80-84% of the time, based only on the amino acid sequence of the
mutageneized region.. Such decision algorithms did not require the formulation of any
rules that involved site-to-site interactions, but rather, performed well based on the
stringency of specific critical sites in the protein that accept only a restricted set of
amino acids. In some cases, neural networks scored almost 10% higher than decision
algorithms on the same databases (i.e., 94%). However, the success of the primitive
decision algorithms and perceptrons at sorting sequences into categories suggests that
linear effects predominate in the classification of a mutant's phenotype. Such
methods should be generally applicable to the broad spectrum of databases that are
currently being generated in combinatorial chemistry and biology experiments.