# Construction of substitution matrices

part III

#### Making the PAM log-odds matrix step by step

We have already made the PAM2 probability matrix and now use it to calculate the PAM2 log-odds scoring matrix according to Equation 1. \[ S_{ij} = \frac{1}{\lambda} log_{10} \left( \frac{M_{ij}}{f_{i}f_{j}} \right) \: \: (Equation \: 1.), \] where \( M_{ij} \) is the probability of an amino acid substitution, eg., A to R or the reverse R to A. \( f_{i}f_{j} \) is the background probability of these amino acids, meaning the probability that the substitution occurs by chance.

The six steps to compute the log-odds matrix are (1) Compute the background probabilities, (2) Compute the joint probabilities, (3) Make the matrix symmetric, (4) Compute the odds for each amino acid, (5) Scale the matrix values to convenient numbers, (6) Take a logarithm from each odds value.

###### Step 1. Compute the background probabilities

The background probabilities for each amino acid is the sum of the probabilities in each column in the PAM2 mutability matrix divided by its respective mutability. Fortunately, we don't need to do this again, since we already computed the values on the previous page (Table 2 on the previous page).

###### Step 2. Compute the joint probabilities

Now we aim to get the values corresponding to \( M_{ij} \) in Equation 1 and start with computing the joint probabilities, i.e., the total probability space for each amino acid, we multiply each PAM2 probability entry with the corresponding relative amino acid mutability calculated initially by Dayhoff and colleagues. However, these are not yet the final \( M_{ij} \) values (Table 1).

###### Step 3. Make the matrix symmetric

Now we want to conclude the calculation of the values corresponding to \( M_{ij} \) in Equation 1. However, the PAM2 probability matrix is not symmetric; thus, we symmetrize it by first computing the sum of each 'forward' and 'reverse' substitution probability and then calculate the mean by dividing the sum by two. This operation assumes that the 'forward' and 'reverse' substitution probabilities are equal, meaning that substitutions such as A to R and R to A or D to H and H to D are equally probable, although they were not equal in the original PAM2 probability matrix. However, by observing sequence alignments alone, we wouldn't know which way a particular substitution has occurred. Therefore, the mean value for each probability pair is the best we can do. Figure 1 illustrates the summation of the probabilities in a portion of the PAM2 matrix, and the table 2 below shows the results. Note that the table is now symmetric and that these are the \( M_{ij} \) values. Since we have already previously normalized the background probabilities, which we used to compute the joint probabilities, we should not have to normalize again, but we check that the sums of rows sum to one, in this case to 10,000 since we multiplied each entry by 10,000 for readability.

###### Step 4. Compute the odds for each amino acid

The odds part of the Equation 1 is \( \frac {M_{ij}}{f_{i}f_{j}} \) and we already have all \( M_{ij} \), all \( f_{i} \) and \( f_{j} \), so we only need to compute the corresponding ratios. Figure 2 illustrates the computation in a portion of the joint probability matrix.

###### Step 5. Scale the matrix values to convenient numbers

It is common to scale the odds to get a desired magnitude of scores. Nevertheless, for simplicity, we do not scale and thus set lambda to one.

###### Step 6. Take a logarithm from each odds value

By taking the logarithm of base ten of each odds, we get the following PAM2 log-odds matrix.

The Make PAM matrices page has a small program to calculate any PAM scoring matrix up to PAM2000, except PAM0. The PAM0 matrix is just all zeros, except the diagonal values all being one, corresponding to the distance of zero PAMs - no mutations have occurred, because the evolutionary distance is zero.

#### What next?

#### Related tutorials

Pair-wise sequence alignment methodsHow to select the right substitution matrix?

#### References

Dayhoff MO, Schwartz RM, Orcutt BC. *A model of evolutionary change in proteins. In: Atlas of Protein Sequence and Structure* (1978). National Biomedical Research Foundation, Washington, DC.

Dayhoff MO, Schwartz RM, Orcutt BC. *Matrices for detecting distant relationships In: Atlas of Protein sequence and Structure* (1978), National Biomedical Research Foundation, Washington, DC.

Henikoff S, Henikoff JG. *Amino acid substitution matrices from protein blocks.* Proc Natl Acad Sci U S A. 1992 Nov 15; 89(22): 10915–10919. PMC.

S Pietrokovski, J G Henikoff, and S Henikoff. *The Blocks database--a system for protein classification*. Nucleic Acids Res. 1996 Jan 1; 24(1): 197–200. PMC