Week 8: Longest Common Subsequence
Chris Tralie
Overview / Logistics
We've now gone through a variety of problems that can be solved efficiently with dynamic programming: change making with coins, edit distance, dynamic time warping, and seam carving. Now we're at the stage where it would be good for you to see if you can devise one yourself. The problem we will consider here is that of the longest common subsequence (LCS) between two strings. A subsequence is a set of characters that occur in order, but which are possibly separated. For example,
rules are not all that great
has the subsequence
rules are great
in common with
Ursinus college students are great students
The LCS problem is to find the maximum such sequence which is the same between two strings. Since each character in a string of length N can be either present or absent in choosing a subsequence, there is a one to one map between binary digits of length N and subsequences of a string of length N. Therefore, there are 2N possible subsequences, so a brute force algorithm would take exponential time. However, there is an O(MN) dynamic programming algorithm to solve the LCS problem between a string of length M and a string of length N, and your job will be to discover it.
Step 1: Recurrence Relation
As with any dynamic programming algorithm, your first step should be to devise a recurrence relation. Focus on simply computing the length of the maximum subsequence first, and worry later about actually computing it. To this end, let len(s1) = M and len(s2) = N, and denote
LCS(i, j)
s1[0:i]
and s2[0:j]
. Assume that you know all LCS(i, j)
except for the final one, LCS(M, N)
. How can you piece together the smaller optimal solutions to create LCS(M, N)
? What should the base case be?
As a hint, consider what to do when the last character is in common between the two, and what to do when it is not.
Step 2: Dynamic Programming
Now that you've devised a recurrence relation, see if you can compute it in a bottom up fashion using dynamic programming. Just as some code to get you started, recall that you can quickly create an MxN 2D dynamic programming table using numpy as
You should test your code out on a variety of examples
Step 3: Backtracing
Now you finally have everything in place that's needed to compute the subsequence itself. Add code to remember optimal moves and do backtracing to extract an optimal subsequence that realizes the maximum length you computed.