はじめに
移植やってます
re.finditer (Python)
import re
rule = r'([KR](?=[^P]))|((?<=W)K(?=P))|((?<=M)R(?=P))'
sequence = "MRGETGVSIKNPRPSRPFSCFWRKGDVENIRKSDIGNEKKIDAKFNRLQYNLYYKPLSHHKAGLLYKELFFRSCFSYTTCSLDFQGKRHQVERKAVDIVL"
print([x.end() for x in re.finditer(rule, sequence)])
# [2, 10, 23, 24, 31, 32, 39, 40, 44, 47, 61, 67, 72, 87, 88, 93, 94]
どうする? (Ruby)
NG.rb
rule = /([KR](?=[^P]))|((?<=W)K(?=P))|((?<=M)R(?=P))/
sequence = "MRGETGVSIKNPRPSRPFSCFWRKGDVENIRKSDIGNEKKIDAKFNRLQYNLYYKPLSHHKAGLLYKELFFRSCFSYTTCSLDFQGKRHQVERKAVDIVL"
p sequence.scan(rule).flatten.compact.map{ sequence.index(_1) + _1.size }
# [2, 10, 2, 10, 2, 10, 10, 10, 10, 2, 10, 10, 2, 10, 2, 2, 10]
scan
は正規表現ですが、index
は単純な検索なんですね。
NG.rb
rule = /([KR](?=[^P]))|((?<=W)K(?=P))|((?<=M)R(?=P))/
sequence = "MRGETGVSIKNPRPSRPFSCFWRKGDVENIRKSDIGNEKKIDAKFNRLQYNLYYKPLSHHKAGLLYKELFFRSCFSYTTCSLDFQGKRHQVERKAVDIVL"
a = []
sequence.scan(rule).flatten.compact.uniq.each do |k|
sequence.size.times do |i|
a << i + k.size if k == sequence[i, k.size]
end
end
p a.sort
# [2, 10, 13, 16, 23, 24, 31, 32, 39, 40, 44, 47, 55, 61, 67, 72, 87, 88, 93, 94]
いい感じではありますが、正しい正規表現での結果とはなっていないようです。
OK.rb
rule = /([KR](?=[^P]))|((?<=W)K(?=P))|((?<=M)R(?=P))/
rule2 = /^([KR](?=[^P]))|((?<=W)K(?=P))|((?<=M)R(?=P))/
sequence = "MRGETGVSIKNPRPSRPFSCFWRKGDVENIRKSDIGNEKKIDAKFNRLQYNLYYKPLSHHKAGLLYKELFFRSCFSYTTCSLDFQGKRHQVERKAVDIVL"
a = []
sequence.scan(rule).flatten.compact.uniq.each do |k|
sequence.size.times do |i|
a << i + k.size if sequence[i..-1].match(rule2)
end
end
p a.uniq
# [2, 10, 23, 24, 31, 32, 39, 40, 44, 47, 61, 67, 72, 87, 88, 93, 94]
ちょっと重複がありますが、目的は達せられた。
OK.rb
rule = [/^([KR](?=[^P]))/, /^((?<=W)K(?=P))/, /^((?<=M)R(?=P))/]
sequence = "MRGETGVSIKNPRPSRPFSCFWRKGDVENIRKSDIGNEKKIDAKFNRLQYNLYYKPLSHHKAGLLYKELFFRSCFSYTTCSLDFQGKRHQVERKAVDIVL"
a = []
sequence.size.times do |i|
rule.each do |k|
a << i + sequence[i..-1].scan(k)[0].size if sequence[i..-1].match(k)
end
end
p a
# [2, 10, 23, 24, 31, 32, 39, 40, 44, 47, 61, 67, 72, 87, 88, 93, 94]
少しだけ、整理。
メモ
- Python の re.finditer を学習した
- 道のりは遠そう