More than 3 years have passed since last update.

配列に含まれる重複した要素を取得する

Ruby

Last updated at 2021-12-09Posted at 2021-12-07

やりたいこと

配列に含まれる重複した要素を取得したい。

animals = %w(🐶 🐱 🐰 🐱 🐵 🐰 🐰)
# animals.???
# => ["🐱", "🐰"]

方法

1. Enumerable#inject を使う

animals.each_with_index.inject([]) do |duplicated_animals, (animal, i)|
  # すでに重複していることが分かっている要素であればスキップする。
  next duplicated_animals if duplicated_animals.include?(animal)
  
  # animals の現在のインデックスより右側を調べる。
  duplicated_animals << animal if animals[i + 1..-1].include?(animal)
  duplicated_animals
end
# => ["🐱", "🐰"]

2. Enumerable#group_by と Object#itself を使う

animals.group_by(&:itself).select { _2.size > 1 }.keys
# => ["🐱", "🐰"]

# 解説
animals.group_by(&:itself)
# => {"🐶"=>["🐶"], "🐱"=>["🐱", "🐱"], "🐰"=>["🐰", "🐰", "🐰"], "🐵"=>["🐵"]}
animals.group_by(&:itself).select { _2.size > 1 }
# => {"🐱"=>["🐱", "🐱"], "🐰"=>["🐰", "🐰", "🐰"]}
animals.group_by(&:itself).select { _2.size > 1 }.keys
# => ["🐱", "🐰"]

3. Enumerable#count を使う

animals.select { animals.count(_1) > 1 }.uniq
# => ["🐱", "🐰"]

4. Array#index と Array#rindex を使う

# 右に向かって探索した場合と左に向かって探索した場合とで最初に見つかった位置が異なる場合、その要素が重複すると言える。
animals.select { animals.index(_1) != animals.rindex(_1) }.uniq
# => ["🐱", "🐰"]

パフォーマンス比較

# gem install benchmark-ips
require 'benchmark/ips'

numbers = Array.new(10_000) { [*1..100].sample }

Benchmark.ips do |x|
  x.report('1') do
    numbers.each_with_index.inject([]) do |duplicated_numbers, (number, i)|
      next duplicated_numbers if duplicated_numbers.include?(number)

      duplicated_numbers << number if numbers[i + 1..-1].include?(number)
      duplicated_numbers
    end
  end
  x.report('2') do
    numbers.group_by(&:itself).select { _2.size > 1 }.keys
  end
  x.report('3') do
    numbers.select { numbers.count(_1) > 1 }.uniq
  end
  x.report('4') do
    numbers.select { numbers.index(_1) != numbers.rindex(_1) }.uniq
  end
  x.compare!
end

# 念のためすべての方法の結果が等しいことを確認する。
results1 = numbers.each_with_index.inject([]) do |duplicated_numbers, (number, i)|
  next duplicated_numbers if duplicated_numbers.include?(number)

  duplicated_numbers << number if numbers[i + 1..-1].include?(number)
  duplicated_numbers
end
results2 = numbers.group_by(&:itself).select { _2.size > 1 }.keys
results3 = numbers.select { numbers.count(_1) > 1 }.uniq
results4 = numbers.select { numbers.index(_1) != numbers.rindex(_1) }.uniq

[results1, results2, results3, results4].uniq.size
# => 1

Warming up --------------------------------------
                   1    38.000  i/100ms
                   2   202.000  i/100ms
                   3     1.000  i/100ms
                   4    12.000  i/100ms
Calculating -------------------------------------
                   1    385.300  (± 0.3%) i/s -      1.938k in   5.029877s
                   2      2.020k (± 1.2%) i/s -     10.100k in   5.001583s
                   3      2.627  (± 0.0%) i/s -     14.000  in   5.328698s
                   4    128.926  (± 0.0%) i/s -    648.000  in   5.026160s

Comparison:
                   2:     2019.6 i/s
                   1:      385.3 i/s - 5.24x  (± 0.00) slower
                   4:      128.9 i/s - 15.67x  (± 0.00) slower
                   3:        2.6 i/s - 768.72x  (± 0.00) slower

2. Enumerable#group_by と Object#itself を使う

この方法が一番早い

3. Enumerable#count を使う

この方法が最も直感的だと僕は思うが、一番遅い

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up