Suboptimal Alignment Algorithm

Despite the existence of the Lalign program which could find internal duplications between two strings, I've developed my own algorithm for carrying out a similar function as part of this Rosalind problem.


import distance
b="GACTCCTTTGTTTGCCTTAAATAGATACATATTTACTCTTGACTCTTTTGTTGGCCTTAAATAGATACATATTTGTGCGACTCCACGAGTGATTCGTA"
a="ATGGACTCCTTTGTTTGCCTTAAATAGATACATATTCAACAAGTGTGCACTTAGCCTTGCCGACTCCTTTGTTTGCCTTAAATAGATACATATTTG"
c=0
da=""
db=""
for j in range(32,41):
    for i in range(len(a)-j):
        c1=0
        for k in range(len(b)-j):
            if distance.hamming(a[i:i+j], b[k:k+j])<=3:
                c1+=1
                if c1>c:
                    c=c1
print c
c=0
da=a
db=b
a=db
b=da
for j in range(32,41):
    for i in range(len(a)-j):
        c1=0
        for k in range(len(b)-j):
            if distance.hamming(a[i:i+j], b[k:k+j])<=3:
                c1+=1
                if c1>c:
                    c=c1

print c

This code takes two DNA strings that share some short inexact repeat of 32-40 bp that may appear with slight modifications (each repeat differ by ≤3 changes/indels).

As you can see, this code is far from perfect. For one, it involves hardcoding in the sequence (a bad habit of mine). For another thing, anyone human being with at least half a brain will notice that this is essentially the same few lines of code repeated. What could be uglier than seeing the same thing twice? We could easily clean this up:

def func(a, b):
    for j in range(32,41):
        for i in range(len(a)-j):
            c1=0
            for k in range(len(b)-j):
                if distance.hamming(a[i:i+j], b[k:k+j])<=3:
                    c1+=1
                    if c1>c:
                        c=c1
    return c
print func(a, b)
print func(b, a)

Ahh, much better. I wanted to post it as an example of the importance of using functions, even if a script will only be run once or, perhaps, never at all. Now, if only we could do something about that hardcoding habit of mine...

Many of my scripts can be found on my Github page.