动态规划法-双序列比对.ppt
回顾1/55”DynamicProgrammingEditDistance(编辑距离)Alignment(比对)DirectedAcyclicGraphEditGraphBacktrackingTGCAT-A-CAT-C-TGATC
4,求两条序列的最长共同子序列。【作业】习题2/55v=TACGGGTATw=GGACGTACG
0123456789000000000001020304050607080905GGACGTACGTACGGGTAT
SequenceAlignment4
Outline5/551243GlobalAlignmentScoringMatricesLocalAlignmentAlignmentwithAffineGapPenalties1234
FromLCStoAlignment:ChangeuptheScoringTheLongestCommonSubsequence(LCS)problem—thesimplestformofsequencealignment–allowsonlyinsertionsanddeletions(nomismatches).IntheLCSProblem,wescored1formatchesand0forindelsConsiderpenalizingindelsandmismatcheswithnegativescoresSimplestscoringschema:+1:matchpremium-μ:mismatchpenalty-σ:indelpenalty-TGCAT-A-CAT-C-TGATCAKRANRKAAANK-1+(-1)+(-2)+5+7+3=11
SimpleScoring7/55Whenmismatchesarepenalizedby–μ,indelsarepenalizedby–σ,andmatchesarerewardedwith+1,theresultingscoreis:#matches–μ(#mismatches)–σ(#indels)
TheGlobalAlignmentProblemFindthebestalignmentbetweentwostringsunderagivenscoringschemaInput:StringsvandwandascoringschemaOutput:Alignmentofmaximumscorem:mismatchpenaltyσ:indelpenalty
ScoringMatrices9/55Togeneralizescoring,considera(4+1)x(4+1)scoringmatrixδ.Inthecaseofanaminoacidsequencealignment,thescoringmatrixwouldbea(20+1)x(20+1)size.Theadditionof1istoincludethescoreforcomparisonofagapcharacter“-”.Thiswillsimplifythealgorithmasfollows:
TheBlosum62ScoringMatrix
MeasuringSimilarityMeasuringtheextentofsimilaritybetweentwosequencesBasedonpercentsequenceidentityBasedonconservation
PercentSequenceIdentityTheextenttowhichtwonucleotideoraminoacidsequencesareinvariantACCTGAG–AGACGTG–GCAG70%identicalmismatchindel
MakingaScoringMatrixScoringmatricesarec