MUTANALYST

An online tool for assessing the mutational spectrum of epPCR libraries with poor sampling.

Background

Error prone PCR is a method to create a pool of amplicons with some random errors. For the best results the number of mutations and the spectrum of the mutations needs to be controlled, hence the need for a test library. The calculations of a test libray are slightly laborious and are affected by the very small sample size. This calculator tries to overcome these two issues by computing the mutational biases given a starting sequence and list of mutant genotypes, by calculating the mutations per sequence by fitting it to a Poisson distribution and by estimating the errors in the values. In particular, the errors are calculated using the assumption that a mutation and its complementary are equally likely in light of the double helix nature of DNA (e.g. A to G on one strand will result in T to C on the other). For the specific formulae used see this note about propagating errors.
The program can calculate mutation frequencies from the list of mutations found and the template sequence or it can also accept the frequencies directly. The 'Demo' values are from an actual experiment.

Starting from a sequence and a mutant genotype list

Sequence

Sequence amplified by mutagenic PCR:

Mutations found

This is the list of the mutations found. Identifying the mutations is best done with visual checks, but if the process needs speeding up the process, this small helper script may help.
The format is as follows:

Each line contains one or more mutations of a variant sampled.
The mutations can only be in the forms A123C or 123A>C, where the number is irrelevant (and can be omitted).
A wild type sequence can be indicated with 'wt', it is not needed for the main calculations and it is used solely for the mutational frequency —and useful for Pedel.
Rarer events such as insertions, deletion, duplications, frameshifts and inversions, are not taken into account, but their frequency can be easily calculated using the 'values for further analysis' below.

Mutational frequency

The simplest estimate of the frequency of mutations per sequence is the average of the point mutations per sequence (m), however due to the small sample size this may be off. The distribution of number of mutations per sequence follows a PCR distribution, which can approximated with a Poisson distribution (Sun, 1995). In the latter, the mean and the variance are the same (λ —unrelated to PCR efficiency—). The sample average and variance may differ, especially at low sampling. The number to trust the most is the λ_Poisson.

The average is N/A mutations per sequence (N/A kb).

The sample variance is N/A mutations per sequence.

The λ_Poisson is N/A mutations per sequence.

Google Charts API loading asynchronously to speed up the script...

Download. If you wish to reproduce the above, here are the values used:

Frequency bins, sequences sampled with 0,1,2,3 etc. mutations: N/A
Total mutations sampled: N/A

If the λ_Poisson and average are very different and the plot is very poor, sequencing more variants from the test library may be reccomendable.

Starting from a table of tallied nucleotide specific mutations

Rows represent the wildtype base, while columns the base in the mutant.

From\To	A	T	G	C
A
T
G
C

Colour codes	Identity	Purine transition	Pyrimine transition	Transversion

Proportion of Adenine	%
Proportion of Thymine	%
Proportion of Guanine	%
Proportion of Cytosine	%

Corrected mutation incidence

Data display options		Raw data		Frequency normalised		Strand complimentary normalised

Sequence-composition–corrected incidence of mutations (%):

From/To	A	T	G	C
A
T
G
C

Graphical Representation

Download

Bias indicators

Indicator	Calculated	Estimated error
Ts/Tv
AT→GC/GC→AT
A→N, T→N (%)
G→N,C→N (%)
AT→GC (%)
GC→AT (%)
Transitions (%) total
A→G, T→C (%)
G→A, C→T (%)
transversions (%) Total
A→T, T→A (%)
A→C, T→G (%)
G→C, C→G (%)
G→T, C→A (%)

Google Charts API loading asynchronously to speed up the script...

Download

Where to next?

There are several other complementary easy to use tools, which you can be used for further analysis.

Pedel

Pedel-AA is a tool to assess library completeness at the amino acid level. Namely, given a library of a given size, what are the chances that one has picked all single mutations and so forth? In essence this is what is mathematically called the coupon collector problem.
If you want you data sent directly, provide library size and click here ( More options).

The last three bases are the end codon.

The offset of your coding frame.

Use PCR distribution.

↳Cycles in PCR run.

↳PCR effieciency.

Frequency of Insertions.

Frequency of Deletions.

Weblogo

If you have a large amount of sequences, then it becomes possible to distribution of mutations across the sequence may become informative. For this Weblogo, a tool to create sequence logos, might be of use.