More than 5 years have passed since last update.

アルゴリズム1000本ノック #2. Longest common substrings

Posted at 2015-12-08

Problem

与えられた２つの文字列に対し, 共通で持つ部分文字列の最大長を求めよ.
例) "abcdefg" と "cdeg" が与えられたとして, 最大の部分文字列は "cde" なので答えは３.

Solution

DP（ダイナミックプログラミング）の中では古典的な問題です。

２つの文字列を S1 と S2, それぞれの長さをM, Nとします.
総当りでS1からすべての部分文字列を抽出して, それらがS2に含まれているかを１つずつチェックすることでも解けますが, 計算量は O(M^2*N) となってしまいます.

効率的に解くために, 配列 m[M][N] を用意して m[i][j] に S1[i] と S2[j] がそれぞれ共通の部分文字列の何文字目にあたるかを格納するようにします. （共通の部分文字列に含まれていない場合は0を入れる）

上の例を見ると, S1[3] も S2[1] も 'd', かつ and m[2][0] = 1 なので m[3][1] = 2 となります. つまり, S1[3] and S2[1] は部分文字列の２文字目（１つ前の文字が１文字目なので）ということになります.
これを一般化して, m[i][j] は S1[i] =S2[j] ならm[i-1][j-1]+1, そうでなければ0と値を求めていくことができます.

これがいわゆるボトムアップのダイナミックプログラミングです。

algorithms_dp_longest_common_substring.py

def LCS(s1, s2):
    if not s1 or not s2:
        return 0
    
    len1, len2 = len(s1), len(s2)
    
    # memorization
    m = []
    for i in range(len1):
        m.append([0]*len2) #m[len1][len2]
    
    lcs = 0
    
    # set 1st index value
    for i, ch in enumerate(s1):
        if ch == s2[0]:
            m[i][0] = 1
            lcs = 1
    for i, ch in enumerate(s2):
        if ch == s1[0]:
            m[0][i] = 1
            lcs = 1
            
    # m[i][j] stands for s1[i] and s2[j]
    # is the m[i][j] th bit of a common substring
    for i in range(1, len1):
        for j in range(1, len2):
            if s1[i] == s2[j]:
                m[i][j] = m[i-1][j-1] + 1
                lcs = max(m[i][j], lcs)
            else:
                m[i][j] = 0
    return lcs

この場合, 計算量と空間料は O(MN)となります.

Jave版のコードはこちら:

algorithms_dp_longest_common_substring.java

public class Solution
{
  public static int LCS(String s1, String s2) {
    int l1 = s1.length();
    int l2 = s2.length();
    if (l1 == 0 || l2 == 0) return 0; // no common strings
    
    // memorization
    int[][] m = new int[l1][l2];
    int maxLen = 0;
    
    // set 1st index value
    for (int i = 0; i < l1; i++) {
      if (s1.charAt(i) == s2.charAt(0)) {
    	m[i][0] = 1;
        maxLen = 1;
      } else m[i][0] = 0;
    }
    for (int i = 0; i < l2; i++) {
      if (s2.charAt(i) == s1.charAt(0)) {
    	m[0][i] = 1;
        maxLen = 1;
      } else m[0][i] = 0;
    }
    
	// m[i][j] stands for s1.charAt(i) and s2.charAt(j)
    // is the m[i][j] th bit of a common substring
    for (int i = 1; i < l1; i++) {
      for (int j = 1; j < l2; j++) {
      	if (s1.charAt(i) != s2.charAt(j)) m[i][j] = 0;
        else {
          m[i][j] = m[i-1][j-1] + 1;
          maxLen = (m[i][j] > maxLen)? m[i][j] : maxLen;
        }
      }
    }
    return maxLen;
  }
}

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up