Ruby
Gem
diff-lcs

RubyでDiffを求める方法

More than 1 year has passed since last update.

自作アプリでdiffを求める必要があり、RubyのGemのdiff/lcsの使い方を少し調べたので、自分用に使い方をメモ。


diff-lcsのインストール方法

gem install diff-lcs を実行


用意したテキスト

# 比較元

text_a = <<-"EOS"
abc
def
ghi
jkl
123
456
789
/*-
EOS

# 比較先
text_b = <<-"EOS"
abc
def
ghm
jkl
987
654
321
EOS


2つのテキストの差分を取得する

テキストの差分を取得するメソッドは、Diff::LCS.sdiffDiff::LCS.diff の2つがある。複数行の文字列を比較した場合の2つのメソッドの違いは以下のとおり。


  • Diff::LCS.sdiff

    比較結果を1文字ずつ表示する。


  • Diff::LCS.diff

    比較した結果、違いがあった行について、違いがあった箇所のみ表示する。



実行結果

require 'diff/lcs'

# [比較1] Diff::LCS.sdiff で text_a と text_b を比較
# text_a との比較結果が sdiffs 変数に格納される
sdiffs = Diff::LCS.sdiff(text_a, text_b)
sdiffs.each do |sdiff|
p sdiff
end
# ["=", [0, "a"], [0, "a"]]
# ["=", [1, "b"], [1, "b"]]
# ["=", [2, "c"], [2, "c"]]
# ["=", [3, "\n"], [3, "\n"]]
# ["=", [4, "d"], [4, "d"]]
# ["=", [5, "e"], [5, "e"]]
# ["=", [6, "f"], [6, "f"]]
# ["=", [7, "\n"], [7, "\n"]]
# ["=", [8, "g"], [8, "g"]]
# ["=", [9, "h"], [9, "h"]]
# ["!", [10, "i"], [10, "m"]]
# ["=", [11, "\n"], [11, "\n"]]
# ["=", [12, "j"], [12, "j"]]
# ["=", [13, "k"], [13, "k"]]
# ["=", [14, "l"], [14, "l"]]
# ["=", [15, "\n"], [15, "\n"]]
# ["!", [16, "1"], [16, "9"]]
# ["!", [17, "2"], [17, "8"]]
# ["!", [18, "3"], [18, "7"]]
# ["=", [19, "\n"], [19, "\n"]]
# ["-", [20, "4"], [20, nil]]
# ["-", [21, "5"], [20, nil]]
# ["=", [22, "6"], [20, "6"]]
# ["+", [23, nil], [21, "5"]]
# ["+", [23, nil], [22, "4"]]
# ["=", [23, "\n"], [23, "\n"]]
# ["!", [24, "7"], [24, "3"]]
# ["!", [25, "8"], [25, "2"]]
# ["!", [26, "9"], [26, "1"]]
# ["-", [27, "\n"], [27, nil]]
# ["-", [28, "/"], [27, nil]]
# ["-", [29, "*"], [27, nil]]
# ["-", [30, "-"], [27, nil]]
# ["=", [31, "\n"], [27, "\n"]]

# [比較2] Diff::LCS.diff で text_a と text_b を比較
# text_a との比較結果が diffs 変数に格納される
diffs = Diff::LCS.diff(text_a, text_b)
diffs.each do |diff|
p diff
end
# [["-", 10, "i"], ["+", 10, "m"]]
# [["-", 16, "1"], ["-", 17, "2"], ["-", 18, "3"], ["+", 16, "9"], ["+", 17, "8"], ["+", 18, "7"]]
# [["-", 20, "4"], ["-", 21, "5"]]
# [["+", 21, "5"], ["+", 22, "4"]]
# [["-", 24, "7"], ["-", 25, "8"], ["-", 26, "9"], ["-", 27, "\n"], ["-", 28, "/"], ["-", 29, "*"], ["-", 30, "-"], ["+", 24, "3"], ["+", 25, "2"], ["+", 26, "1"]]

※ 2つのメソッドの違いを分かりやすくするため、あえてeachメソッドを使って比較結果を1行ずつ表示しています。


2つのテキストの差分を元のテキストに適用する

Diff::LCS.patch の第1引数に適用したいテキストを設定し、第2引数に差分を設定する。第2引数に設定する差分は、Diff::LCS.sdiff または Diff::LCS.diff のどちらで取得したものを設定しても同じ結果になる。


差分を適用するコード

# patched_text に差分を適用した結果を格納する

patched_text = Diff::LCS.patch(text_a, sdiffs)
# patched_text = Diff::LCS.patch(text_a, diffs) でも同じ結果になる


差分を適用した結果

text_a の内容が text_b と同じになっている。

puts patched_text

# abc
# def
# ghm
# jkl
# 987
# 654
# 321


参考にしたウェブサイト

Ruby で文字列を比較して、差分を強調表示させる - Qiita

rubyでdiffを求める | dev.wan.co



蛇足

使う場面があるかないかは不明だが、入れ子になった配列同士と、入れ子になった配列とそうでない配列で Diff::LCS.sdiffDiff::LCS.diff を使った場合の結果は以下のとおり。

実行コード

array_1 = [["a", "b", "c"], ["d", "e", "f"], ["g", "h", "i"], ["j", "k", "l"], ["1", "2", "3"], ["4", "5", "6"], ["7", "8", "9"], ["/", "*", "-"]]

array_2 = [["a", "b", "c"], ["d", "e", "f"], ["g", "h", "m"], ["j", "k", "l"], ["9", "8", "7"], ["6", "5", "4"], ["3", "8", "1"]]
array_3 = array_2.flatten

# Diff::LCS.sdiff で array_1 と array_2 を比較
sdiffs = Diff::LCS.sdiff(array_1, array_2)
sdiffs.each do |sdiff|
p sdiff
end

# Diff::LCS.diff で array_1 と array_2 を比較
diffs = Diff::LCS.diff(array_1, array_2)
diffs.each do |diff|
p diff
end

# Diff::LCS.sdiff で array_1 と array_3 を比較
sdiffs = Diff::LCS.sdiff(array_1, array_3)
sdiffs.each do |sdiff|
p sdiff
end

# Diff::LCS.diff で array_1 と array_2 を比較
diffs = Diff::LCS.diff(array_1, array_3)
diffs.each do |diff|
p diff
end

実行結果

-----code: sdiffs = Diff::LCS.sdiff(array_1, array_2)-----

["=", [0, ["a", "b", "c"]], [0, ["a", "b", "c"]]]
["=", [1, ["d", "e", "f"]], [1, ["d", "e", "f"]]]
["!", [2, ["g", "h", "i"]], [2, ["g", "h", "m"]]]
["=", [3, ["j", "k", "l"]], [3, ["j", "k", "l"]]]
["!", [4, ["1", "2", "3"]], [4, ["9", "8", "7"]]]
["!", [5, ["4", "5", "6"]], [5, ["6", "5", "4"]]]
["!", [6, ["7", "8", "9"]], [6, ["3", "8", "1"]]]
["-", [7, ["/", "*", "-"]], [7, nil]]

-----code: diffs = Diff::LCS.diff(array_1, array_2)-----
[["-", 2, ["g", "h", "i"]], ["+", 2, ["g", "h", "m"]]]
[["-", 4, ["1", "2", "3"]], ["-", 5, ["4", "5", "6"]], ["+", 4, ["9", "8", "7"]], ["-", 6, ["7", "8", "9"]], ["+", 5, ["6", "5", "4"]], ["-", 7, ["/", "*", "-"]], ["+", 6, ["3", "8", "1"]]]

-----code: sdiffs = Diff::LCS.sdiff(array_1, array_3)-----
["!", [0, ["a", "b", "c"]], [0, "a"]]
["!", [1, ["d", "e", "f"]], [1, "b"]]
["!", [2, ["g", "h", "i"]], [2, "c"]]
["!", [3, ["j", "k", "l"]], [3, "d"]]
["!", [4, ["1", "2", "3"]], [4, "e"]]
["!", [5, ["4", "5", "6"]], [5, "f"]]
["!", [6, ["7", "8", "9"]], [6, "g"]]
["!", [7, ["/", "*", "-"]], [7, "h"]]
["+", [8, nil], [8, "m"]]
["+", [8, nil], [9, "j"]]
["+", [8, nil], [10, "k"]]
["+", [8, nil], [11, "l"]]
["+", [8, nil], [12, "9"]]
["+", [8, nil], [13, "8"]]
["+", [8, nil], [14, "7"]]
["+", [8, nil], [15, "6"]]
["+", [8, nil], [16, "5"]]
["+", [8, nil], [17, "4"]]
["+", [8, nil], [18, "3"]]
["+", [8, nil], [19, "8"]]
["+", [8, nil], [20, "1"]]

-----code: diffs = Diff::LCS.diff(array_1, array_3)-----
[["-", 0, ["a", "b", "c"]], ["-", 1, ["d", "e", "f"]], ["+", 0, "a"], ["-", 2, ["g", "h", "i"]], ["+", 1, "b"], ["-", 3, ["j", "k", "l"]], ["+", 2, "c"], ["-", 4, ["1", "2", "3"]], ["+", 3, "d"], ["-", 5, ["4", "5", "6"]], ["+", 4, "e"], ["-", 6, ["7", "8", "9"]], ["+", 5, "f"], ["-", 7, ["/", "*", "-"]], ["+", 6, "g"], ["+", 7, "h"], ["+", 8, "m"], ["+", 9, "j"], ["+", 10, "k"], ["+", 11, "l"], ["+", 12, "9"], ["+", 13, "8"], ["+", 14, "7"], ["+", 15, "6"], ["+", 16, "5"], ["+", 17, "4"], ["+", 18, "3"], ["+", 19, "8"], ["+", 20, "1"]]