# MUTANALYST

An online tool for assessing the mutational spectrum of epPCR libraries with poor sampling.

## Background

Error prone PCR is a method to create a pool of amplicons with some random errors. For the best results the number of mutations and the spectrum of the mutations needs to be controlled, hence the need for a test library. The calculations of a test libray are slightly laborious and are affected by the very small sample size. This calculator tries to overcome these two issues by computing the mutational biases given a starting sequence and list of mutant genotypes, by calculating the mutations per sequence by fitting it to a Poisson distribution and by estimating the errors in the values.
In particular, the errors are
calculated using the assumption that a mutation and its complementary are equally likely in light of the double helix
nature of DNA (*e.g.* A to G on one strand will result in T to C on the other). For the specific formulae used see this note about propagating errors.

The program can calculate mutation frequencies from the list of mutations found and the template sequence or it can also accept the frequencies directly. The 'Demo' values are from an actual experiment.

## Starting from a sequence and a mutant genotype list

### Sequence

Sequence amplified by mutagenic PCR:

### Mutations found

This is the list of the mutations found. Identifying the mutations is best done with visual checks, but if the process needs speeding up the process, this small helper script may help.

The format is as follows:

- Each line contains one or more mutations of a variant sampled.
- The mutations can only be in the forms A123C or 123A>C, where the number is irrelevant (and can be omitted).
- A wild type sequence can be indicated with 'wt', it is not needed for the main calculations and it is used solely for the mutational frequency —and useful for Pedel.
- Rarer events such as insertions, deletion, duplications, frameshifts and inversions, are not taken into account, but their frequency can be easily calculated using the 'values for further analysis' below.

## Mutational frequency

The simplest estimate of the frequency of mutations per sequence is the average of the point mutations per sequence (*m*), however due to the small sample size this may be off. The distribution of number of mutations per sequence follows a PCR distribution, which can approximated with a Poisson distribution (Sun, 1995). In the latter, the mean and the variance are the same (λ —unrelated to PCR efficiency—). The *sample* average and variance may differ, especially at low sampling. The number to trust the most is the λ_{Poisson}.

The average
is **N/A** mutations per sequence (N/A kb).

The sample variance is **N/A** mutations per sequence.

The λ_{Poisson} is **N/A** mutations per sequence.

- Frequency bins, sequences sampled with 0,1,2,3
*etc.*mutations:**N/A** - Total mutations sampled:
**N/A**

If the λ_{Poisson} and average are very different and the plot is very poor, sequencing more variants from the test library may be reccomendable.

## Starting from a table of tallied nucleotide specific mutations

From\To | A | T | G | C |
---|---|---|---|---|

A | ||||

T | ||||

G | ||||

C |

Colour codes | Identity | Purine transition | Pyrimine transition | Transversion |
---|

Proportion of Adenine | % |
---|---|

Proportion of Thymine | % |

Proportion of Guanine | % |

Proportion of Cytosine | % |

## Corrected mutation incidence

Data display options | Raw data | Frequency normalised | Strand complimentary normalised |
---|

Sequence-composition–corrected incidence of mutations (%):

From/To | A | T | G | C |
---|---|---|---|---|

A | ||||

T | ||||

G | ||||

C |

### Graphical Representation

## Bias indicators

Indicator | Calculated | Estimated error |
---|---|---|

Ts/Tv | ||

AT→GC/GC→AT | ||

A→N, T→N (%) | ||

G→N,C→N (%) | ||

AT→GC (%) | ||

GC→AT (%) | ||

Transitions (%) total | ||

A→G, T→C (%) | ||

G→A, C→T (%) | ||

transversions (%) Total | ||

A→T, T→A (%) | ||

A→C, T→G (%) | ||

G→C, C→G (%) | ||

G→T, C→A (%) |

## Where to next?

### Pedel

Pedel-AA is a tool to assess library completeness at the amino acid level. Namely, given a library of a given size, what are the chances that one has picked all single mutations and so forth? In essence this is what is mathematically called the coupon collector problem.

If you want you data sent directly, provide library size and click here ( More options).