This webtool compares two alternative multiple sequence alignments (MSAs) to determine how well they align homologous residues in the same columns as one another. It classifies similarities and differences into conserved sequence, conserved gaps, splits, merges and shifts. Summarising these categories for each column yields information on which columns are agreed upon by both MSAs, and which differ. Output graphs visualise the comparison data for analysis.
AlignStat is fully described in this paper . Please cite the paper if you use this tool in your research
Alignments must be in fasta, clustal, msf, or phylip formats. Both alignments should contain the same sequences in any order.
If you are unsure how to format your inputs or simply want some data to try the app please take a look at the example data.Example Data
When comparing the positions of two MSAs: A ‘match’ is when both alignments contain an identical characters that is not a gap. A ‘merge’ is when alignment A contains a gap, but alignment B contains any other character. A ‘split’ is when alignment B contains a gap, but alignment A contains any other character. A ‘shift’ is when two alignments contain a non-identical character, neither of which are gaps. A ‘conserved gap’ is when the both alignments contain a gap.
For further information see github.com/TS404/AlignStat
The Similarity Matrix records which columns of the reference and comparison MSAs best match. Its [i,j]th entry is the similarity score between the ith column of the reference alignment and the jth column of the comparison alignment. Used to determine which columns are most similar for further analysis. Used to generate the similarity heatmap plot.
The Dissimilarity Matrix categorises dissimilarity in the MSF. Its [i,j]th entry is the dissimilarity category of the jth residue of the ith sequence for the reference alignment versus the comparison alignment (M=match, g=conserved gap, m=merge, s=split, x=shift). Used to the dissimilarity matrix plot.
The Results Matrix summarises columnwise similarity and disimilarity. Its [i,j]th entry is the ith match category average of the jth column of the reference alignment versus the comparison alignment (i1=match, i2=conserved gap, i3=merge, i4=split, i5=shift). Used to generate the similarity summary and dissimilarity summary plots.
The Sum of Pairs Reference lists all residue pairs present in the reference MSA. The Sum of Pairs Comparison lists all residue pairs present in the comparison MSA. The Sum of Pairs Scores lists the proportion of reference pairs retained in each column of the comparison MSA