Functional Segregation of Overlapping Genes in HIV.pdf

  1. Using  alanine scanning and deep mutational scanning found segregated organization in overlapping genes where functionally important residues in one gene tended to overlap with non functional or highly mutable regions in another gene

    1. I want to go over their data to understand what mutations had an affect on one gene but not another. It has to do with degenerate codons, but it is more that that, since many amino acids have similar shapes, properties, or function. Replacing one amino acid with a similar amino acid may be alright.
    2. Would we be able to simulate the toxicity of a given mutant in silico? If so, we could test out our mutants and only try out the ones that are the best

    Overlapping genes in natural and engineered genomes

    1. Can someone send me this article? I want to read it, but I do not have institutional access

    I need further steps on the pipeline!

    Synthetic sequence entanglement-min.pdf

    Pseudocode for algorithm with one overlapping region

    CODON_TABLE = {
        'TCA': 'S', 'TCC': 'S', 'TCG': 'S', 'TCT': 'S', 'TTC': 'F', 'TTT': 'F', 'TTA': 'L', 'TTG': 'L',
        'TAC': 'Y', 'TAT': 'Y', 'TAA': '*', 'TAG': '*', 'TGC': 'C', 'TGT': 'C', 'TGA': '*', 'TGG': 'W',
        'CTA': 'L', 'CTC': 'L', 'CTG': 'L', 'CTT': 'L', 'CCA': 'P', 'CCC': 'P', 'CCG': 'P', 'CCT': 'P',
        'CAC': 'H', 'CAT': 'H', 'CAA': 'Q', 'CAG': 'Q', 'CGA': 'R', 'CGC': 'R', 'CGG': 'R', 'CGT': 'R',
        'ATA': 'I', 'ATC': 'I', 'ATT': 'I', 'ATG': 'M', 'ACA': 'T', 'ACC': 'T', 'ACG': 'T', 'ACT': 'T',
        'AAC': 'N', 'AAT': 'N', 'AAA': 'K', 'AAG': 'K', 'AGC': 'S', 'AGT': 'S', 'AGA': 'R', 'AGG': 'R',
        'GTA': 'V', 'GTC': 'V', 'GTG': 'V', 'GTT': 'V', 'GCA': 'A', 'GCC': 'A', 'GCG': 'A', 'GCT': 'A',
        'GAC': 'D', 'GAT': 'D', 'GAA': 'E', 'GAG': 'E', 'GGA': 'G', 'GGC': 'G', 'GGG': 'G', 'GGT': 'G',
    }
    
    REVERSE_CODON_TABLE = {
        'A': ['GCA', 'GCC', 'GCG', 'GCT'],
        'C': ['TGC', 'TGT'],
        'D': ['GAC', 'GAT'],
        'E': ['GAA', 'GAG'],
        'F': ['TTC', 'TTT'],
        'G': ['GGA', 'GGC', 'GGG', 'GGT'],
        'H': ['CAC', 'CAT'],
        'I': ['ATA', 'ATC', 'ATT'],
        'K': ['AAA', 'AAG'],
        'L': ['TTA', 'TTG', 'CTA', 'CTC', 'CTG', 'CTT'],
        'M': ['ATG'],
        'N': ['AAC', 'AAT'],
        'P': ['CCA', 'CCC', 'CCG', 'CCT'],
        'Q': ['CAA', 'CAG'],
        'R': ['CGA', 'CGC', 'CGG', 'CGT', 'AGA', 'AGG'],
        'S': ['TCA', 'TCC', 'TCG', 'TCT', 'AGC', 'AGT'],
        'T': ['ACA', 'ACC', 'ACG', 'ACT'],
        'V': ['GTA', 'GTC', 'GTG', 'GTT'],
        'W': ['TGG'],
        'Y': ['TAC', 'TAT'],
        '*': ['TAA', 'TAG', 'TGA'],
    }
    
    FUNCTION get_degenerate_codons(codon):
      amino_acid = CODON_TABLE[codon]
      degenerate_codons  = REVERSE_CODON_TABLE[amino_acid]
      return degenerate_codons
    
    FUNCTION generate_variant_graph(dna_sequence, overlap_start, overlap_end):
        # Step 1: Initialize the graph structure
        varient_graph = []
    
        # Step 2: Build the graph
        FOR i FROM 0 TO LENGTH(dna_sequence) - 1:
            IF i IS BETWEEN overlap_start AND overlap_end - 1:
                # Inside overlap region: handle codons for Gene 2
                IF (i - overlap_start) MOD 3 == 0:
                    # Start of a new codon in Gene 2
                    codon = dna_sequence[i TO i + 2]  # Extract the codon
                    degenerate_codons = get_degenerate_codons(codon)
                    graph.APPEND(synonymous_codons)  # Add to graph
            ELSE:
                # Outside overlap region: handle individual nucleotides
                graph.APPEND([dna_sequence[i]])  # Add single nucleotide to graph
    
        RETURN varient_graph
    

Lysis Protein: METRFPQQSQQTPASTNRRRPFKHEDYPCRRQQRSSTLYVLIFLAIFLSKFTNQLLLSLLEAVIRTVTTLQQLLT

First 49 overlap

Last 144 overlap

Total Length: 227

(0,48) (82, 226)

If I truncate the lys gene I'm may affect the expression of downstream genes

She has a link on her presentation.

Coat protein lys protein interaction.

Come up with 5 or 3 mutants that I think are promising.

I have done this overlap, of those, these couple of mutants because of bla bla bla.

Another constraint: codon distribution

Which parts of the RNA secondary structure are important? Which are not?

Which attributes of lys protein must be conserved? Which not so much?

How does the DNA pack into cp? Would a change in sequence affect a change in DNA packing?

I need to go through the entire virjs lifecycle to determine which may be affected by a change in DNA sequence