GRAPe is a tool for computing genome re-alignment using marginalized posterior decoding.
Most algorithms treat the question of finding alignments as an optimization problem.
It is more appropriate to view it as an
inference problem:
given sequences, can we infer which nucleotides share a common ancestor?
To answer this question, GRAPe uses the Marginalized Posterior
Decoding (MPD) algorithm which uses the posterior distribution of alignments
to optimize the correct assignment of homology of individual
nucleotides, instead of finding a single most probable alignment. Simulations
show that the MPD algorithm has higher sensitivity and specificity than the
Viterbi and Needleman-Wunsch algorithms.
The software can be downloaded
here.
Please note that GRAPe is still under development, so expect regular updates.
If you run into any problems, please
email me .
The following genomes are available for download and browsing:
Alignments broadly follow the
.axt file format,
but the two alignment lines are followed by a line annotating the posterior probability of every column:
361 chrX 8437003 8437070 chr8 70513702 70513775 + 1992
AGAC--------TGCCCGTGCATATATACCAGTTACTATATGGACAGTTAAAAAAAATAGGGAGAGAGAAATCAAT
AGAAGAGCAATATGTTTGTACACATATTCCTGTTACTTTATGAACAATA--AAGAAATGGGAACAGGGAAATGAAG
hhihKMNNOONKtPTUVXYZZZZZZZZZZZZZZZZZZZZZZZZZZZZXREBBEHKORVYYYYYYYYYYYXWVQHEx
The posterior ranges from 0 to 1, represented by letters a-zA-Z, where a=0 and Z=1. For ungapped columns,
it refers to the posterior probability that those nucleotides align. The same base in different locations is considered a different
nucleotide. For gapped columns, the number represents the posterior probability of the nucleotide NOT being aligned to any
nucleotide (i.e. not distinction is made between gaps of different sizes or locations).