More than 5 years have passed since last update.

[正規表現] 英文から簡単に単語を抽出する

Last updated at 2019-07-16Posted at 2019-07-16

##はじめに

次のような英文があるとする．

sample.rb

txt =
"The widow she cried over me, and called me a poor lamb, and she called
me a lot of other names, too, but she never meant no harm by it. She 
put me in them new clothes again, and I couldn’t do nothing but sweat
and sweat, and feel all cramped up. Well, then, the old thing commenced
again. The widow rung a bell for supper, and you hadto come to time. 
When you got to the table you couldn’t go right to eating, but you 
had to wait for the widow to tuck down her head and grumble a little over the victuals, though there warn’t really anything the matter with them, – that is, nothing only everything was cooked by itself. In a
barrel of odds and ends it is different; things get mixed up, and the
juice kind of swaps around, and the things go better."

このとき，英文中のmeが出現する回数を数えるなら，

sample.rb

txt.scan("me").size

とするのが一般的だが，me以外の単語も数えてしまう．例えばnamesやmeantに反応する．他にも色々と回避するために正規表現を付け加えると，行頭のmeやme,に反応しなくなる不幸も起きる．

##\bの効用
そこで余計な単語を無視したいときに役立つのが\bである．\bは単語の境界を表す正規表現で，単語の区切り目を条件づけることができる．

sample2.rb

txt.scan(/\bme\b/).size

単語を取り出したいと感じたらとりあえず\b使っていこう！

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up