schemarecomb.pdb_structure

Parsing and manipulation of Protein Data Bank structure files.

This module provides the definition of schemarecomb.PDBStructure and the accessory classes AminoAcid and Atom that represent the eponymous entities within a PDB structure.

PDBStructures contain a list of AminoAcid objects, which act as containers for Atom objects that are read from “ATOM” lines in a PDB structure file.

Parsing and modifying protein structure data is necessary for recombinant library design because SCHEMA energy calculations require the distance between amino acids within the protein.

These structures are obtained from the Protein Data Bank at https://www.rcsb.org and loaded as Python objects. Nearly all information is discarded except for ATOM lines, which specify data about atoms within the structure.

The most confusing part about PDB structure manipulation is the renumbering of atoms to match SCHEMA-RASPP parent alignment. Relieving user confusion about this process is a primary goal of this module. Note that atom/residue indexing in PDB files begins at 1, while Python’s indexing starts at 0. For consistency with the rest of the package, these classes index the residue number starting from 0. As a result, PDB file reading and writing PDB must convert between indexing. For example, the “ALA” lines in the pdb file below indicate that the 15th amino acid in the structure is alanine. When read with this module, this alanine will be labeled with an index of 14. This is consistent with sequence number: if pdb_seq is the Python String representing the amino acid sequence of the structure , this residue would be pdb_seq[14].

PDB files generally have this structure (example structure 1GNX):

...<other structure data>...
ATOM      1  N   ALA A  15      -1.611  17.176  10.792  1.00 36.46           N
ATOM      2  CA  ALA A  15      -1.871  18.610  11.107  1.00 36.85           C
ATOM      3  C   ALA A  15      -2.021  18.795  12.611  1.00 36.41           C
ATOM      4  O   ALA A  15      -2.983  18.321  13.215  1.00 38.36           O
ATOM      5  CB  ALA A  15      -3.131  19.081  10.392  1.00 35.10           C
ATOM      6  N   LEU A  16      -1.064  19.496  13.206  1.00 34.22           N
ATOM      7  CA  LEU A  16      -1.061  19.738  14.642  1.00 29.97           C
ATOM      8  C   LEU A  16      -1.711  21.073  14.992  1.00 30.29           C
ATOM      9  O   LEU A  16      -1.462  22.089  14.341  1.00 30.34           O
ATOM     10  CB  LEU A  16       0.380  19.716  15.152  1.00 26.33           C
ATOM     11  CG  LEU A  16       1.228  18.548  14.639  1.00 24.12           C
ATOM     12  CD1 LEU A  16       2.681  18.761  15.026  1.00 22.75           C
ATOM     13  CD2 LEU A  16       0.698  17.230  15.195  1.00 23.37           C
ATOM     14  N   THR A  17      -2.541  21.066  16.028  1.00 30.68           N
ATOM     15  CA  THR A  17      -3.217  22.278  16.472  1.00 29.15           C
ATOM     16  C   THR A  17      -2.576  22.784  17.756  1.00 28.03           C
ATOM     17  O   THR A  17      -2.337  22.012  18.683  1.00 28.04           O
ATOM     18  CB  THR A  17      -4.716  22.014  16.733  1.00 30.86           C
ATOM     19  OG1 THR A  17      -5.357  21.666  15.501  1.00 32.64           O
ATOM     20  CG2 THR A  17      -5.385  23.246  17.319  1.00 30.50           C
...<other atoms>...
...<other structure data>...

Classes

AminoAcid(atoms)

Amino acid within a PDB structure.

Atom(serial_num, name, alt_loc, res_name, …)

Atom within a PDB structure.