Jovian
⭐️
Sign In
Learn data science and machine learning by building real-world projects on Jovian

Longest Common Subsequence

QUESTION 1: Write a function to find the length of the longest common subsequence between two sequences. E.g. Given the strings "serendipitous" and "precipitation", the longest common subsequence is "reipito" and its length is 7.

A "sequence" is a group of items with a deterministic ordering. Lists, tuples and ranges are some common sequence types in Python.

A "subsequence" is a sequence obtained by deleting zero or more elements from another sequence. For example, "edpt" is a subsequence of "serendipitous".

General case
Test cases
  1. General case (string)
  2. General case (list)
  3. No common subsequence
  4. One is a subsequence of the other
  5. One sequence is empty
  6. Both sequences are empty
  7. Multiple subsequences with same length
    1. “abcdef” and “badcfe”

Longest common subsequence test cases:

In [1]:
T0 = {
    'input': {
        'seq1': 'serendipitous',
        'seq2': 'precipitation'
    },
    'output': 7
}

T1 = {
    'input': {
        'seq1': [1, 3, 5, 6, 7, 2, 5, 2, 3],
        'seq2': [6, 2, 4, 7, 1, 5, 6, 2, 3]
    },
    'output': 5
}

T2 = {
    'input': {
        'seq1': 'longest',
        'seq2': 'stone'
    },
    'output': 3
}

T3 = {
    'input': {
        'seq1': 'asdfwevad',
        'seq2': 'opkpoiklklj'
    },
    'output': 0
}

T4 = {
    'input': {
        'seq1': 'dense',
        'seq2': 'condensed'
    },
    'output': 5
}

T5 = {
    'input': {
        'seq1': '',
        'seq2': 'opkpoiklklj'
    },
    'output': 0
}

T6 = {
    'input': {
        'seq1': '',
        'seq2': ''
    },
    'output': 0
}

T7 = {
    'input': {
        'seq1': 'abcdef',
        'seq2': 'badcfe'
    },
    'output': 3
}
In [2]:
lcq_tests = [T0, T1, T2, T3, T4, T5, T6, T7]
In [ ]:
 
In [ ]:
 
In [ ]:
 
Recursive Solution
  1. Create two counters idx1 and idx2 starting at 0. Our recursive function will compute the LCS of seq1[idx1:] and seq2[idx2:]

  2. If seq1[idx1] and seq2[idx2] are equal, then this character belongs to the LCS of seq1[idx1:] and seq2[idx2:] (why?). Further the length this is LCS is one more than LCS of seq1[idx1+1:] and seq2[idx2+1:]

  1. If not, then the LCS of seq1[idx1:] and seq2[idx2:] is the longer one among the LCS of seq1[idx1+1:], seq2[idx2:] and the LCS of seq1[idx1:], seq2[idx2+1:]
  1. If either seq1[idx1:] or seq2[idx2:] is empty, then their LCS is empty.

Here's what the tree of recursive calls looks like:

Complexity Analysis

Worst case occurs when each time we have to try 2 subproblems i.e. when the sequences have no common elements.

Here's what the tree looks like in such a case (source - Techie Delight):

All the leaf nodes are (0, 0). Can you count the number of leaf nodes?

HINT: Count the number of unique paths from root to leaf. The length of each path is m+n and at each level there are 2 choices.

Based on the above can you infer that the time complexity is \(O(2^{m+n})\).

Dynamic programming

  1. Create a table of size (n1+1) * (n2+1) initialized with 0s, where n1 and n2 are the lengths of the sequences. table[i][j] represents the longest common subsequence of seq1[:i] and seq2[:j]. Here's what the table looks like (source: Kevin Mavani, Medium).
  1. If seq1[i] and seq2[j] are equal, then table[i+1][j+1] = 1 + table[i][j]

  2. If seq1[i] and seq2[j] are equal, then table[i+1][j+1] = max(table[i][j+1], table[i+1][j])

Verify that the complexity of the dynamic programming approach is \(O(N1 * N2)\).

In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 

0-1 Knapsack Problem

Problem statement

You’re in charge of selecting a football (soccer) team from a large pool of players. Each player has a cost, and a rating. You have a limited budget. What is the highest total rating of a team that fits within your budget. Assume that there’s no minimum or maximum team size.

General problem statemnt:

Given n elements, each of which has a weight and a profit, determine the maximum profit that can be obtained by selecting a subset of the elements weighing no more than w.

Test cases:

  1. Some generic test cases
  2. All the elements can be included
  3. None of the elements can be included
  4. Only one of the elements can be included
  5. ???

Knapsack test cases:

In [3]:
test0 = {
    'input': {
        'capacity': 165,
        'weights': [23, 31, 29, 44, 53, 38, 63, 85, 89, 82],
        'profits': [92, 57, 49, 68, 60, 43, 67, 84, 87, 72]
    },
    'output': 309
}

test1 = {
    'input': {
        'capacity': 3,
        'weights': [4, 5, 6],
        'profits': [1, 2, 3]
    },
    'output': 0
}

test2 = {
    'input': {
        'capacity': 4,
        'weights': [4, 5, 1],
        'profits': [1, 2, 3]
    },
    'output': 3
}

test3 = {
    'input': {
        'capacity': 170,
        'weights': [41, 50, 49, 59, 55, 57, 60],
        'profits': [442, 525, 511, 593, 546, 564, 617]
    },
    'output': 1735
}

test4 = {
    'input': {
        'capacity': 15,
        'weights': [4, 5, 6],
        'profits': [1, 2, 3]
    },
    'output': 6
}

test5 = {
    'input': {
        'capacity': 15,
        'weights': [4, 5, 1, 3, 2, 5],
        'profits': [2, 3, 1, 5, 4, 7]
    },
    'output': 19
}
In [4]:
tests = [test0, test1, test2, test3, test4, test5]
In [ ]:
 
Recursion
  1. We'll write a recursive function that computes max_profit(weights[idx:], profits[idx:], capacity), with idx starting from 0.

  2. If weights[idx] > capacity, the current element is cannot be selected, so the maximum profit is the same as max_profit(weights[idx+1:], profits[idx+1:], capacity).

  3. Otherwise, there are two possibilities: we either pick weights[idx] or don't. We can recursively compute the maximum

    A. If we don't pick weights[idx], once again the maximum profit for this case is max_profit(weights[idx+1:], profits[idx+1:], capacity)

    B. If we pick weights[idx], the maximum profit for this case is profits[idx] + max_profit(weights[idx+1:], profits[idx+1:], capacity - weights[idx]

  4. If weights[idx:] is empty, the maximum profit for this case is 0.

Here's a visualization of the recursion tree:

Verify that the time complexity of the recursive algorithm is \(O(2^N)\)

Dynamic Programming
  1. Create a table of size (n+1) * (capacity+1) consisting of all 0s, where is n is the number of elements. table[i][c] represents the maximum profit that can be obtained using the first i elements if the maximum capacity is c. Here's a visual representation of a filled table (source - geeksforgeeks):

(The 0th row will contain all zeros and is not shown above.)

  1. We'll fill the table row by row and column by column. table[i][c] can be filled using some values in the row above it.

  2. If weights[i] > c i.e. if the current element can is larger than capacity, then table[i+1][c] is simply equal to table[i][c] (since there's no way we can pick this element).

  3. If weights[i] <= c then we have two choices: to either pick the current element or not to get the value of table[i+1][c]. We can compare the maximum profit for both these options and pick the better one as the value of table[i][c].

    A. If we don't pick the element with weight weights[i], then once again the maximum profit is table[i][c]

    B. If we pick the element with weight weights[i], then the maximum profit is profits[i] + table[i][c-weights[i]], since we have used up some capacity.

Verify that the complexity of the dynamic programming solution is \(O(N * W)\).

In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 

Longest common subsequence:

In [5]:
def lcq_recursive(seq1, seq2, idx1=0, idx2=0):
    # Check if either of the sequences is exhausted
    if idx1 == len(seq1) or idx2 == len(seq2):
        return 0
    
    # Check if the current characters are equal
    if seq1[idx1] == seq2[idx2]:
        return 1 + lcq_recursive(seq1, seq2, idx1+1, idx2+1)
    # Skip one element from each sequence
    else:
        return max(lcq_recursive(seq1, seq2, idx1+1, idx2), 
                   lcq_recursive(seq1, seq2, idx1, idx2+1))    
In [8]:
from jovian.pythondsa import evaluate_test_cases
In [9]:
evaluate_test_cases(lcq_recursive, lcq_tests)
TEST CASE #0 Input: {'seq1': 'serendipitous', 'seq2': 'precipitation'} Expected Output: 7 Actual Output: 7 Execution Time: 303.001 ms Test Result: PASSED TEST CASE #1 Input: {'seq1': [1, 3, 5, 6, 7, 2, 5, 2, 3], 'seq2': [6, 2, 4, 7, 1, 5, 6, 2, 3]} Expected Output: 5 Actual Output: 5 Execution Time: 4.127 ms Test Result: PASSED TEST CASE #2 Input: {'seq1': 'longest', 'seq2': 'stone'} Expected Output: 3 Actual Output: 3 Execution Time: 0.177 ms Test Result: PASSED TEST CASE #3 Input: {'seq1': 'asdfwevad', 'seq2': 'opkpoiklklj'} Expected Output: 0 Actual Output: 0 Execution Time: 90.804 ms Test Result: PASSED TEST CASE #4 Input: {'seq1': 'dense', 'seq2': 'condensed'} Expected Output: 5 Actual Output: 5 Execution Time: 0.099 ms Test Result: PASSED TEST CASE #5 Input: {'seq1': '', 'seq2': 'opkpoiklklj'} Expected Output: 0 Actual Output: 0 Execution Time: 0.001 ms Test Result: PASSED TEST CASE #6 Input: {'seq1': '', 'seq2': ''} Expected Output: 0 Actual Output: 0 Execution Time: 0.001 ms Test Result: PASSED TEST CASE #7 Input: {'seq1': 'abcdef', 'seq2': 'badcfe'} Expected Output: 3 Actual Output: 3 Execution Time: 0.033 ms Test Result: PASSED SUMMARY TOTAL: 8, PASSED: 8, FAILED: 0
Out[9]:
[(7, True, 303.001),
 (5, True, 4.127),
 (3, True, 0.177),
 (0, True, 90.804),
 (5, True, 0.099),
 (0, True, 0.001),
 (0, True, 0.001),
 (3, True, 0.033)]
In [24]:
%%time
lcq_recursive('seredipitous', 'precipitation')
CPU times: user 163 ms, sys: 1.84 ms, total: 165 ms Wall time: 164 ms
Out[24]:
7
In [25]:
%%time
lcq_recursive('Asdfsfafssess', 'oypiououuiuo')
CPU times: user 2.58 s, sys: 2.17 ms, total: 2.58 s Wall time: 2.58 s
Out[25]:
0
In [26]:
def lcq_memoized(seq1, seq2):
    memo = {}
    
    def recurse(idx1, idx2):
        key = idx1, idx2
        
        if key in memo:
            return memo[key]
        
        if idx1 == len(seq1) or idx2 == len(seq2):
            memo[key] = 0
        elif seq1[idx1] == seq2[idx2]:
            memo[key] = 1 + recurse(idx1+1, idx2+1)
        else:
            memo[key] = max(recurse(idx1+1, idx2), 
                            recurse(idx1, idx2+1))
        return memo[key]
        
    return recurse(0, 0)
In [27]:
evaluate_test_cases(lcq_memoized, lcq_tests)
TEST CASE #0 Input: {'seq1': 'serendipitous', 'seq2': 'precipitation'} Expected Output: 7 Actual Output: 7 Execution Time: 0.182 ms Test Result: PASSED TEST CASE #1 Input: {'seq1': [1, 3, 5, 6, 7, 2, 5, 2, 3], 'seq2': [6, 2, 4, 7, 1, 5, 6, 2, 3]} Expected Output: 5 Actual Output: 5 Execution Time: 0.084 ms Test Result: PASSED TEST CASE #2 Input: {'seq1': 'longest', 'seq2': 'stone'} Expected Output: 3 Actual Output: 3 Execution Time: 0.045 ms Test Result: PASSED TEST CASE #3 Input: {'seq1': 'asdfwevad', 'seq2': 'opkpoiklklj'} Expected Output: 0 Actual Output: 0 Execution Time: 0.119 ms Test Result: PASSED TEST CASE #4 Input: {'seq1': 'dense', 'seq2': 'condensed'} Expected Output: 5 Actual Output: 5 Execution Time: 0.041 ms Test Result: PASSED TEST CASE #5 Input: {'seq1': '', 'seq2': 'opkpoiklklj'} Expected Output: 0 Actual Output: 0 Execution Time: 0.004 ms Test Result: PASSED TEST CASE #6 Input: {'seq1': '', 'seq2': ''} Expected Output: 0 Actual Output: 0 Execution Time: 0.002 ms Test Result: PASSED TEST CASE #7 Input: {'seq1': 'abcdef', 'seq2': 'badcfe'} Expected Output: 3 Actual Output: 3 Execution Time: 0.032 ms Test Result: PASSED SUMMARY TOTAL: 8, PASSED: 8, FAILED: 0
Out[27]:
[(7, True, 0.182),
 (5, True, 0.084),
 (3, True, 0.045),
 (0, True, 0.119),
 (5, True, 0.041),
 (0, True, 0.004),
 (0, True, 0.002),
 (3, True, 0.032)]
In [28]:
%%time
lcq_memoized('Asdfsfafssess', 'oypiououuiuo')
CPU times: user 172 µs, sys: 1 µs, total: 173 µs Wall time: 177 µs
Out[28]:
0
In [29]:
%%time
lcq_memoized('seredipitous', 'precipitation')
CPU times: user 157 µs, sys: 0 ns, total: 157 µs Wall time: 161 µs
Out[29]:
7
In [30]:
%%time
lcq_memoized('longest', 'stone')
CPU times: user 48 µs, sys: 0 ns, total: 48 µs Wall time: 50.3 µs
Out[30]:
3

Dynamic programming:

In [31]:
def lcq_dp(seq1, seq2):
    n1, n2 = len(seq1), len(seq2)
    results = [[0 for _ in range(n2+1)] for _ in range(n1+1)]
    for idx1 in range(n1):
        for idx2 in range(n2):
            if seq1[idx1] == seq2[idx2]:
                results[idx1+1][idx2+1] = 1 + results[idx1][idx2]
            else:
                results[idx1+1][idx2+1] = max(results[idx1][idx2+1], results[idx1+1][idx2])
    return results[-1][-1]
In [32]:
evaluate_test_cases(lcq_dp, lcq_tests)
TEST CASE #0 Input: {'seq1': 'serendipitous', 'seq2': 'precipitation'} Expected Output: 7 Actual Output: 7 Execution Time: 0.096 ms Test Result: PASSED TEST CASE #1 Input: {'seq1': [1, 3, 5, 6, 7, 2, 5, 2, 3], 'seq2': [6, 2, 4, 7, 1, 5, 6, 2, 3]} Expected Output: 5 Actual Output: 5 Execution Time: 0.048 ms Test Result: PASSED TEST CASE #2 Input: {'seq1': 'longest', 'seq2': 'stone'} Expected Output: 3 Actual Output: 3 Execution Time: 0.031 ms Test Result: PASSED TEST CASE #3 Input: {'seq1': 'asdfwevad', 'seq2': 'opkpoiklklj'} Expected Output: 0 Actual Output: 0 Execution Time: 0.065 ms Test Result: PASSED TEST CASE #4 Input: {'seq1': 'dense', 'seq2': 'condensed'} Expected Output: 5 Actual Output: 5 Execution Time: 0.033 ms Test Result: PASSED TEST CASE #5 Input: {'seq1': '', 'seq2': 'opkpoiklklj'} Expected Output: 0 Actual Output: 0 Execution Time: 0.006 ms Test Result: PASSED TEST CASE #6 Input: {'seq1': '', 'seq2': ''} Expected Output: 0 Actual Output: 0 Execution Time: 0.004 ms Test Result: PASSED TEST CASE #7 Input: {'seq1': 'abcdef', 'seq2': 'badcfe'} Expected Output: 3 Actual Output: 3 Execution Time: 0.031 ms Test Result: PASSED SUMMARY TOTAL: 8, PASSED: 8, FAILED: 0
Out[32]:
[(7, True, 0.096),
 (5, True, 0.048),
 (3, True, 0.031),
 (0, True, 0.065),
 (5, True, 0.033),
 (0, True, 0.006),
 (0, True, 0.004),
 (3, True, 0.031)]
In [33]:
%%time
lcq_dp('Asdfsfafssess', 'oypiououuiuo')
CPU times: user 92 µs, sys: 0 ns, total: 92 µs Wall time: 93.9 µs
Out[33]:
0
In [34]:
%%time
lcq_dp('seredipitous', 'precipitation')
CPU times: user 94 µs, sys: 0 ns, total: 94 µs Wall time: 98.9 µs
Out[34]:
7
In [35]:
%%time
lcq_dp('longest', 'stone')
CPU times: user 32 µs, sys: 1e+03 ns, total: 33 µs Wall time: 35 µs
Out[35]:
3
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [ ]:
 

Knapsack Solutions

In [36]:
from jovian.pythondsa import evaluate_test_cases
In [37]:
def max_profit_recursive(capacity, weights, profits, idx=0):
    if idx == len(weights):
        return 0
    if weights[idx] > capacity:
        return max_profit_recursive(capacity, weights, profits, idx+1)
    else:
        return max(max_profit_recursive(capacity, weights, profits, idx+1),
                   profits[idx] + max_profit_recursive(capacity-weights[idx], weights, profits, idx+1))
In [38]:
evaluate_test_cases(max_profit_recursive, tests)
TEST CASE #0 Input: {'capacity': 165, 'weights': [23, 31, 29, 44, 53, 38, 63, 85, 89, 82], 'profits': [92, 57, 49, 68, 6... Expected Output: 309 Actual Output: 309 Execution Time: 0.174 ms Test Result: PASSED TEST CASE #1 Input: {'capacity': 3, 'weights': [4, 5, 6], 'profits': [1, 2, 3]} Expected Output: 0 Actual Output: 0 Execution Time: 0.004 ms Test Result: PASSED TEST CASE #2 Input: {'capacity': 4, 'weights': [4, 5, 1], 'profits': [1, 2, 3]} Expected Output: 3 Actual Output: 3 Execution Time: 0.009 ms Test Result: PASSED TEST CASE #3 Input: {'capacity': 170, 'weights': [41, 50, 49, 59, 55, 57, 60], 'profits': [442, 525, 511, 593, 546, 564,... Expected Output: 1735 Actual Output: 1735 Execution Time: 0.054 ms Test Result: PASSED TEST CASE #4 Input: {'capacity': 15, 'weights': [4, 5, 6], 'profits': [1, 2, 3]} Expected Output: 6 Actual Output: 6 Execution Time: 0.007 ms Test Result: PASSED TEST CASE #5 Input: {'capacity': 15, 'weights': [4, 5, 1, 3, 2, 5], 'profits': [2, 3, 1, 5, 4, 7]} Expected Output: 19 Actual Output: 19 Execution Time: 0.041 ms Test Result: PASSED SUMMARY TOTAL: 6, PASSED: 6, FAILED: 0
Out[38]:
[(309, True, 0.174),
 (0, True, 0.004),
 (3, True, 0.009),
 (1735, True, 0.054),
 (6, True, 0.007),
 (19, True, 0.041)]
In [ ]:
 

Memoized:

In [39]:
def knapsack_memo(capacity, weights, profits):
    memo = {}
    
    def recurse(idx, remaining):
        key = (idx, remaining)
        if key in memo:
            return memo[key]
        elif idx == len(weights):
            memo[key] = 0
        elif weights[idx] > remaining:
            memo[key] = recurse(idx+1, remaining)
        else:
            memo[key] = max(recurse(idx+1, remaining), 
                            profits[idx] + recurse(idx+1, remaining-weights[idx]))
        return memo[key] 
        
    return recurse(0, capacity)
In [40]:
evaluate_test_cases(knapsack_memo, tests)
TEST CASE #0 Input: {'capacity': 165, 'weights': [23, 31, 29, 44, 53, 38, 63, 85, 89, 82], 'profits': [92, 57, 49, 68, 6... Expected Output: 309 Actual Output: 309 Execution Time: 0.279 ms Test Result: PASSED TEST CASE #1 Input: {'capacity': 3, 'weights': [4, 5, 6], 'profits': [1, 2, 3]} Expected Output: 0 Actual Output: 0 Execution Time: 0.007 ms Test Result: PASSED TEST CASE #2 Input: {'capacity': 4, 'weights': [4, 5, 1], 'profits': [1, 2, 3]} Expected Output: 3 Actual Output: 3 Execution Time: 0.011 ms Test Result: PASSED TEST CASE #3 Input: {'capacity': 170, 'weights': [41, 50, 49, 59, 55, 57, 60], 'profits': [442, 525, 511, 593, 546, 564,... Expected Output: 1735 Actual Output: 1735 Execution Time: 0.098 ms Test Result: PASSED TEST CASE #4 Input: {'capacity': 15, 'weights': [4, 5, 6], 'profits': [1, 2, 3]} Expected Output: 6 Actual Output: 6 Execution Time: 0.018 ms Test Result: PASSED TEST CASE #5 Input: {'capacity': 15, 'weights': [4, 5, 1, 3, 2, 5], 'profits': [2, 3, 1, 5, 4, 7]} Expected Output: 19 Actual Output: 19 Execution Time: 0.053 ms Test Result: PASSED SUMMARY TOTAL: 6, PASSED: 6, FAILED: 0
Out[40]:
[(309, True, 0.279),
 (0, True, 0.007),
 (3, True, 0.011),
 (1735, True, 0.098),
 (6, True, 0.018),
 (19, True, 0.053)]
In [ ]:
 
In [ ]:
 

Dynamic programming:

In [41]:
def knapsack_dp(capacity, weights, profits):
    n = len(weights)
    results = [[0 for _ in range(capacity+1)] for _ in range(n+1)]
    
    for idx in range(n):
        for c in range(capacity+1):
            if weights[idx] > c:
                results[idx+1][c] = results[idx][c]
            else:
                results[idx+1][c] = max(results[idx][c], profits[idx] + results[idx][c-weights[idx]])
            
    return results[-1][-1]
In [42]:
evaluate_test_cases(knapsack_dp, tests)
TEST CASE #0 Input: {'capacity': 165, 'weights': [23, 31, 29, 44, 53, 38, 63, 85, 89, 82], 'profits': [92, 57, 49, 68, 6... Expected Output: 309 Actual Output: 309 Execution Time: 0.656 ms Test Result: PASSED TEST CASE #1 Input: {'capacity': 3, 'weights': [4, 5, 6], 'profits': [1, 2, 3]} Expected Output: 0 Actual Output: 0 Execution Time: 0.014 ms Test Result: PASSED TEST CASE #2 Input: {'capacity': 4, 'weights': [4, 5, 1], 'profits': [1, 2, 3]} Expected Output: 3 Actual Output: 3 Execution Time: 0.016 ms Test Result: PASSED TEST CASE #3 Input: {'capacity': 170, 'weights': [41, 50, 49, 59, 55, 57, 60], 'profits': [442, 525, 511, 593, 546, 564,... Expected Output: 1735 Actual Output: 1735 Execution Time: 0.497 ms Test Result: PASSED TEST CASE #4 Input: {'capacity': 15, 'weights': [4, 5, 6], 'profits': [1, 2, 3]} Expected Output: 6 Actual Output: 6 Execution Time: 0.028 ms Test Result: PASSED TEST CASE #5 Input: {'capacity': 15, 'weights': [4, 5, 1, 3, 2, 5], 'profits': [2, 3, 1, 5, 4, 7]} Expected Output: 19 Actual Output: 19 Execution Time: 0.05 ms Test Result: PASSED SUMMARY TOTAL: 6, PASSED: 6, FAILED: 0
Out[42]:
[(309, True, 0.656),
 (0, True, 0.014),
 (3, True, 0.016),
 (1735, True, 0.497),
 (6, True, 0.028),
 (19, True, 0.05)]
In [ ]:
 
In [ ]:
 
In [ ]:
 
In [1]:
import jovian
In [ ]:
jovian.commit(filename="dynamic-programming-problems")
[jovian] Attempting to save notebook..
In [ ]: