More than 1 year has passed since last update.

replace複数回とregどっちがいいんだろ

Posted at 2023-11-04

前説

文字列に対して、いらない文字を複数消したい時何を使えばいいんだろ
replace構文か正規表現か
ということで、調べてみたので備忘録として

結論

replaceが圧倒的に早かったです
１文字の探索ならということか
ただ納得はいってない...
有識者がいれば教えてください

スクリプト

文字列生成

import string
import random

def generate_text(length = 200):
  ch_text = ['"',' ','@']
  # 安全なランダム文字列を生成
  weighted_chars = {
    **{value:1 for value in string.ascii_letters + string.digits},
    **{value:10 for value in ch_text}
  }
  random_string = ''.join(random.choices(list(weighted_chars.keys()), weights=weighted_chars.values(), k=length))

  print(",".join([f"{t}: {random_string.count(t)}" for t in ch_text]))
  return random_string
generate_text()

調べたい文字に重み付けしてランダムに生成という感じです。その他文字は適当に英数字としました。

時間計測のプログラム

text = generate_text(10000)
%time re_text=text.replace(" ", "").replace('"', "").replace('@', "")
%time sub_text = re.sub('[ "@]',"",text)

print(re_text==sub_text)

一応最後に検索結果が同じか出力してます
同じに決まってるけど

結果

": 1080, : 1106,@: 1101
CPU times: user 235 µs, sys: 1 µs, total: 236 µs
Wall time: 262 µs
CPU times: user 1.79 ms, sys: 0 ns, total: 1.79 ms
Wall time: 5.44 ms
True

これが結果
1000字の文字列に対して、各文字が1000くらい
これの検索結果が、replaceだと262 µs,正規表現で5.44 ms
実に、20倍くらい
逆だと思ってたのでとても驚きでした
文字数増えたらどっかで逆転するのかな？

余談

文字列置き換えってAIに使うよね
きっと

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up