String Edit Distance Memoization

In [ ]:
min(edit(s1[0:-1], s2)+1, 
    edit(s1, s2[0:-1]) + 1, 
    edit(s1[0:-1], s2[0:-1])) (+ 1 if s1[-1] is different from s2[-1])

Remember the matching of all possible substrings from s1 to s2

If we include the empty string, there are

(1+len(s1)) * (1+len(s2))

possible matchings from a substring of s1 to a substring of s2

If we assume len(s1) ~= len(s2) = $N$, we need to store $O(N^2)$ costs

In [1]:
import numpy as np
def edit(s1, s2):
    M = len(s1)
    N = len(s2)
    table = np.zeros((M+1, N+1))
    # Fill in the base cases
    table[0, :] = np.arange(N+1)
    table[:, 0] = np.arange(M+1)
    for i in range(1, M+1):
        for j in range(1, N+1):
            # Check table[i-1, j] + 1
            # Check table[i, j-1] + 1
            # Check table[i-1, j-1] (+ 1 if s1[i-1] == s2[j-1])
            # Store table[i, j] as the min of the above possibilities
            ## TODO: Fill this in in your exercise
    return table[-1, -1]
edit("school", "fools")
[[0. 1. 2. 3. 4. 5.]
 [1. 0. 0. 0. 0. 0.]
 [2. 0. 0. 0. 0. 0.]
 [3. 0. 0. 0. 0. 0.]
 [4. 0. 0. 0. 0. 0.]
 [5. 0. 0. 0. 0. 0.]
 [6. 0. 0. 0. 0. 0.]]

Example: Chase To Chris

Example: School To Fools

Example: Topology To Topography

In [ ]: