Optimizing Nucleotide Mixtures to Encode Specific Subsets of Amino
Acids for Semi-random Mutagenesis
Arkin A. P., Youvan D. C.
Department of Chemistry, Massachusetts Institute of Technology, Cambridge 02139.
In random mutagenesis, synthesis of an NNN triplet (i.e. equiprobable A, C, G, and T at
each of the three positions in the codon) could be considered an optimal nucleotide
mixture because all 20 amino acids are encoded. NN(G,C) might be considered a slightly
more intelligent "dope" because the entire set of amino acids is still encoded
using only half as many codons. Using a general algorithm described herein, it is possible
to formulate more complex doping schemes which encode specific subsets of the twenty amino
acids, excluding others from the mix. Maximizing the equiprobability of amino acid
residues contributing to such a subset is suggested as an optimal basis for performing
semi-random mutagenesis. This is important for reducing the nucleotide complexity of
combinatorial cassettes so that "sequence space" can be searched more
efficiently. Computer programs have been developed to provide tables of optimized dopes
compatible with automated DNA synthesizers.