schemarecomb.Library

class schemarecomb.Library(breakpoints, energy, energy_function, mutation_rate, gg_prob, gg_overhangs, gg_enzyme, amino_to_cdn)

A chimeric protein library.

This class is not usually instantiated by users, but instead by another package function. If generating many libraries, the simplest construction method is via the schemarecomb.calc_from_config() constructor, which uses a LibraryConfig to consolidate or compute all parameters except breakpoints and energy.

A library consists of a parent alignment and a series of breakpoints, which are the indices of the alignment where the parent sequences are recombined with Golden Gate assembly to form chimeras. A “block” is defined as the sequence between adjacent breakpoints, such that each chimeric sequence is composed of sequential blocks each taken from a parent. For example, a library with 2 breakpoints and 3 parents has a chimera that may be identified as “201”: the first block from parent 2, the second block from parent 0, and the last block from parent 1.

Each library is assigned metrics of energy and mutation_rate that estimate the fraction of functional recombinants and sequence diversity, respectively. The exact metrics depend on the method used to optimize over the recombinant space. For example, the SCHEMA-RASPP algorithm averages the SCHEMA energy and minimum number of mutations from a parent sequence over every recombinant protein in the library. See Endelman et al. 2004 for more details about SCHEMA-RASPP.

Each library is also assigned a “Golden Gate probability” that estimates the efficiency of library assembly given the optimal set of Golden Gate overhangs chosen for the given library. This value is computed (and overhangs chosen) based on data from Gregory Lohman and collaborators at NEB (Potapov et al. 2018). See Pryor, Potapov, et al. 2020 for details on the biology and mathematics behind this technique. (Note that the calculation was developed independently by the authors of schemarecomb in 2019 after viewing a talk by Dr. Lohman.)

See schemarecomb.libraries for classes and functions that handle Library instances.

The attributes for this class are the same as the parameters, with the addition of max_block_len and min_block_len attributes that give the maximum and minimum block length in the library, respectively.

You might want to add additional DNA bases to the ends of the fragments, as this may improve restriction enzyme efficiency.

Note

This class may produce unintended Golden Gate sites when generating DNA. The current version does not check for this. Simulate restriction enzyme cutting and Golden Gate Assembly in a program like Benchling or Snapgene, then change codons as necessary.

Parameters
  • breakpoints (list[BreakPoint]) – Alignment indices where the parent sequences are recombined to form the recombinants in the Library.

  • energy (Decimal) – Estimation of the fraction of functional recombinants relative to other libraries from the same parent alignment. High energy libraries are likely to have a larger proportion of active enzymes. The interpretation of this value depends on the EnergyFunction used to calculate it, which must be passed in as the energy_function parameter.

  • energy_function (EnergyFunction) – The EnergyFunction used the calculate the energy attribute.

  • mutation_rate (Decimal) – Average mutational distance to closest parent for each recombinant in the library.

  • gg_prob (Decimal) – Golden Gate efficiency using gg_overhangs, calculated with gg_enzyme.

  • gg_overhangs (list[Overhang]) – Collection of overhangs found with (near) maximum gg_prob. Element indices correspond to the same index in breakpoints, e.g. gg_overhangs[i] is the overhang at breakpoints[i].position. Note that there may exist a valid overhang set for this library with better gg_prob depending on the heuristic used to optimize gg_prob.

  • gg_enzyme (RestrictionEnzyme) – The enzyme used to calculate gg_prob and optimize over possible gg_overhangs. This should correspond to the restriction enzyme to be used to assemble the library.

  • amino_to_cdn (dict[str, set[str]]) – Mapping of amino acids to codons. Forms a simple codon optimization.

Attributes
  • All parameters for this class are also attributes.

  • block_indices (list[tuple[int, int]]) – The start and end indicies of each block. See block_indices() for more information.

  • max_block_len (int) – Size of the largest block in the library.

  • min_block_len (int) – Size of the smallest block in the library.

  • dna_blocks (list[list[str]]) – DNA sequences for the chimeric blocks, with Golden Gates sites at the ends. Each inner list corresponds to one of the parents in energy_function.parents. Each element in an inner list is a DNA sequence that translates to amino acid blocks from the parent.

  • dna_blocks (list[SeqRecord]) – Chimeric DNA blocks with id ‘<parent_name>_block-<block number>’. Groups of chimeric blocks with compatible block numbers (from 0 to number of blocks, inclusive and all unique) may be Golden Gate Assembled to form DNA that may be transcribed and translated into chimeric proteins.

classmethod calc_from_config(breakpoints, energy, config)

Calculate average mutation rate and gg_prob during construction.

Useful for when constructing many libraries from the same ParentSequences. Energy must be precalculated.

Parameters
  • breakpoints (list[BreakPoint]) – The breakpoints that define the new library.

  • energy (Decimal) – The energy of the new library.

  • config (LibraryConfig) – Configuration shared between all other libraries.

Return type

_Library

find_best_overhangs()

Find the overhangs with the true maximum gg_prob.

Resets the gg_prob and gg_overhangs attributes. This method can be used if a gg_prob threshold <1.0 was used during construction to speed up calculation. After this library is selected, call this method to find the truly optimal set of gg_overhangs, so that the (possibly) improved overhangs are used in the generated DNA sequences.

Return type

None

classmethod from_json(in_json)

Construct instance from JSON.

Parameters

in_json (str) – JSON-formatted string representing a Library.

Return type

_Library

Returns

ParentSequences instance created from in_json.

to_json()

Convert instance to a JSON-formatted string.

Return type

str

Returns

Instance converted to a JSON string.