schemarecomb.energy_functions.SCHEMA¶
- class schemarecomb.energy_functions.SCHEMA(parents)¶
Calculates the energy (approximate fraction functional) of a library.
Given a PDB structure and a multiple sequence alignment, the SCHEMA energy of a recombinant sequence is the number of pairwise interactions broken by recombination. Expressed as an equation (Endelman et al. 2004 eq. 7):
\[S(s) = \sum_i \sum_{j>i} C(i, j) * D(s[i], s[j])\]- where:
\(s\) is the recombinant sequence.
\(C(i, j) = 1\) if residues i and j are within 4.5 Angstroms in the protein structure, else 0.
\(D(s[i], s[j]) = 0\) if the amino acids at positions i and j in the recombinant sequence are also present together in any parent, else 1.
For example, if “AEH” and “GQH” are aligned parents with contacts (0, 1) and (1, 2), chimera “AQH” has a SCHEMA energy of 1.0. Residues ‘A’ and ‘Q’ are not found together in any parents at positions 0 and 1, respectively, and there is a contact at these positions, so \(C(0, 1) * D(s[0], s[1]) = 1\). The ‘Q’ and ‘H’, however, do not contribute to the SCHEMA energy. Although \(C(1, 2) = 1\) because residues 1 and 2 are contacting, ‘Q’ and ‘H’ are found together at these positions in the second parent, so \(D(s[1], s[2]) = 0\).
See Voigt et al. 2002 and Silberg et al. 2004 for more information about SCHEMA energy.
The E_matrix attribute is a square matrix with the same size as the length of alignment. \(E_{rj}\) is Endelman et al. 2004 eq. 6. with the r and t sums removed and evaluated for average SCHEMA energy for a specific r and t:
\[E_{rj} = \frac{1}{p^2} \sum_{p \in parents} \sum_{q \in parents} C(r, j) * D(p[r], q[j]).\]Then eq. 6 can be evaluated for a new block starting at position X_k by summing over \(E\) as in the block_energy method:
\[\langle E \rangle_{(X_1, X_2, ..., X_{k-1}, X_k)} \langle E \rangle_{(X_1, X_2, ..., X_{k-1})} = \sum_{r=X_{k-1}}^{X_k-1} \sum_{t=X_k}^{N-1} E_{rj}.\]Note the sums are decremented by one compared to eq. 6 as it appears in Endelman et al 2004. This was done to better reflect Python indexing.
Finally, a computational speed up is to sequentially calculate the energy of a new block starting at each \(X_k\) given the energy of a new block at position \(X_k - 1\). This is implemented in the increment_block_energy method.
- Parameters
parents (
_ParentSequences) – Parent sequences for recombination. Must be aligned and have pdb_structure attribute. Note that the SCHEMA object is initialized for the ParentSequences at that point in time, so if parents changes, the SCHEMA object will NOT change also.- Attributes
E_matrix (np.ndarray) – Energy matrix corresponding to input ParentSequences’ alignment and contacts.
parents – Parent sequences for recombination.
- Raises
ValueError – If input ParentSequences is not aligned or does not have the pdb_structure attribute on initialization.