Week 8: Longest Common Subsequence

Chris Tralie

Overview / Logistics

We've now gone through a variety of problems that can be solved efficiently with dynamic programming: change making with coins, edit distance, dynamic time warping, and seam carving. Now we're at the stage where it would be good for you to see if you can devise one yourself. The problem we will consider here is that of the longest common subsequence (LCS) between two strings. A subsequence is a set of characters that occur in order, but which are possibly separated. For example,

rules are not all that great

has the subsequence

rules are great

in common with

Ursinus college students are great students

The LCS problem is to find the maximum such sequence which is the same between two strings. Since each character in a string of length N can be either present or absent in choosing a subsequence, there is a one to one map between binary digits of length N and subsequences of a string of length N. Therefore, there are 2N possible subsequences, so a brute force algorithm would take exponential time. However, there is an O(MN) dynamic programming algorithm to solve the LCS problem between a string of length M and a string of length N, and your job will be to discover it.

Step 1: Recurrence Relation

As with any dynamic programming algorithm, your first step should be to devise a recurrence relation. Focus on simply computing the length of the maximum subsequence first, and worry later about actually computing it. To this end, let len(s1) = M and len(s2) = N, and denote

LCS(i, j)

as the maximum length of a subsequence between substrings s1[0:i] and s2[0:j]. Assume that you know all LCS(i, j) except for the final one, LCS(M, N). How can you piece together the smaller optimal solutions to create LCS(M, N)? What should the base case be?

As a hint, consider what to do when the last character is in common between the two, and what to do when it is not.

Step 2: Dynamic Programming

Now that you've devised a recurrence relation, see if you can compute it in a bottom up fashion using dynamic programming. Just as some code to get you started, recall that you can quickly create an MxN 2D dynamic programming table using numpy as

You should test your code out on a variety of examples

Step 3: Backtracing

Now you finally have everything in place that's needed to compute the subsequence itself. Add code to remember optimal moves and do backtracing to extract an optimal subsequence that realizes the maximum length you computed.