AlignStat: A tool for the statistical comparison of alternative multiple sequence alignments

This webtool compares two alternative multiple sequence alignments (MSAs) to determine how well they align homologous residues in the same columns as one another. It classifies similarities and differences into conserved sequence, conserved gaps, splits, merges and shifts. Summarising these categories for each column yields information on which columns are agreed upon by both MSAs, and which differ. Output graphs visualise the comparison data for analysis.

All of the calculations and plots generated by this webtool can also be generated using the underlying AlignStat R package which is available on CRAN

AlignStat is fully described in this paper . Please cite the paper if you use this tool in your research

Upload your two alignments to make a comparison

Alignments must be in fasta, clustal, msf, or phylip formats. Both alignments should contain the same sequences in any order.

If you are unsure how to format your inputs or simply want some data to try the app please take a look at the example data.

Example Data

Your AlignStat results are shown below. To run a new comparison simply refresh this page in your browser

When comparing the positions of two MSAs: A ‘match’ is when both alignments contain an identical characters that is not a gap. A ‘merge’ is when alignment A contains a gap, but alignment B contains any other character. A ‘split’ is when alignment B contains a gap, but alignment A contains any other character. A ‘shift’ is when two alignments contain a non-identical character, neither of which are gaps. A ‘conserved gap’ is when the both alignments contain a gap.

For further information see github.com/TS404/AlignStat

Download results in csv format

Similarity Matrix Dissimilarity Matrix Results Summary

The Similarity Matrix records which columns of the reference and comparison MSAs best match. Its [i,j]th entry is the similarity score between the ith column of the reference alignment and the jth column of the comparison alignment. Used to determine which columns are most similar for further analysis. Used to generate the similarity heatmap plot.

The Dissimilarity Matrix categorises dissimilarity in the MSF. Its [i,j]th entry is the dissimilarity category of the jth residue of the ith sequence for the reference alignment versus the comparison alignment (M=match, g=conserved gap, m=merge, s=split, x=shift). Used to the dissimilarity matrix plot.

The Results Matrix summarises columnwise similarity and disimilarity. Its [i,j]th entry is the ith match category average of the jth column of the reference alignment versus the comparison alignment (i1=match, i2=conserved gap, i3=merge, i4=split, i5=shift). Used to generate the similarity summary and dissimilarity summary plots.

Download SPS results in txt and formats

Sum of Pairs Reference Sum of Pairs Comparison Sum of Pairs Scores

The Sum of Pairs Reference lists all residue pairs present in the reference MSA. The Sum of Pairs Comparison lists all residue pairs present in the comparison MSA. The Sum of Pairs Scores lists the proportion of reference pairs retained in each column of the comparison MSA

AlignStat was developed by Thomas Shafee and Ira Cooke at the La Trobe Institute of Molecular Science (LIMS) and Hexima