schemarecomb.breakpoints.calculate_breakpoints

schemarecomb.breakpoints.calculate_breakpoints(parents, codon_options, start_overhangs=None, end_overhangs=None)

Calculate the breakpoints for the parent alignment.

Currently supports length four Golden Gate sites.

# TODO: move this to module header? A breakpoint is potential site for recombination of the parental sequences. The terms “breakpoint” and “crossover” are used synonymously. A breakpoint’s position is the index of the breakpoint’s second amino acid in the parent alignment. This definition allows the breakpoint position to nicely slice the parent alignment into blocks. For example, let the position of the k-1th, kth, and k+1th breakpoints be b_k_1, b_k, b_k__1, respectively. Then for parent sequence p, the block before the breakpoint is p[b_k_1:b_k] and the block after is p[b_k:b_k__1]. Valid breakpoints contain candidate overhangs, defined below.

Overhangs are the DNA sticky ends used in a Golden Gate reaction. In this module, each overhang consists of a positional shift and DNA sequence. The former is overhang’s position relative to the first base pair in the left codon of the breakpoint (codon index b_k - 1). See BreakPoint.

Example: Suppose the amino acids at indices b_k-1 and b_k at ‘I’ and ‘T’

for all parents in the parent alignment and codon_options contains {‘M’: (‘ATG,), ‘T’: (‘ACC’)}}. Then for each parent p, the only valid codon sequence for p[b_k-1:b_k+1] is ATGACC, and the overhangs are (0, ‘ATGA’), (1, ‘TGAC’), (2, ‘GACC’). This breakpoint will be represented in the breakpoints dictionary as {b_k: [(0, ‘ATGA’), (1, ‘TGAC’), (2, ‘GACC’)]}.

Certain adjacent breakpoints will result in the same <M>, e.g. if alignment positions i and i+1 only consist of {‘A’} and {‘C’} respectively, the libraries of breakpoints (…, i, …) and (…, i+1, …) will have the same <M> if all other breakpoints are the same. Therefore, this function groups the redundant breakpoints and returns a mapping from the original breakpoint index to the nonredundant group index.

Parameters
  • parents (_ParentSequences) – parent alignment for breakpoint calculation.

  • codon_options (dict[str, set[str]]) – Amino acid to available codon mapping. Used in Golden Gate site design and library sequence design. Change this to include or exclude certain codons based on codon optimization schemes, reassigned codons, etc.

  • start_overhangs (Optional[list[Overhang]]) – Positional shift and nucleotide sequence of Golden Gate site for vector insertion at start of sequence. Not factored into calculations if None.

  • end_overhangs (Optional[list[Overhang]]) – Positional shift and nucleotide sequence of Golden Gate site for vector insertion at end of sequence. Not factored into calculations if None.

Return type

list[BreakPoint]

Returns

Mapping from breakpoint position to valid breakpoint overhangs.