0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

[正規表現] 英文から簡単に単語を抽出する

Last updated at Posted at 2019-07-16

##はじめに

次のような英文があるとする.

sample.rb
txt =
"The widow she cried over me, and called me a poor lamb, and she called
me a lot of other names, too, but she never meant no harm by it. She 
put me in them new clothes again, and I couldn’t do nothing but sweat
and sweat, and feel all cramped up. Well, then, the old thing commenced
again. The widow rung a bell for supper, and you hadto come to time. 
When you got to the table you couldn’t go right to eating, but you 
had to wait for the widow to tuck down her head and grumble a little over the victuals, though there warn’t really anything the matter with them, – that is, nothing only everything was cooked by itself. In a
barrel of odds and ends it is different; things get mixed up, and the
juice kind of swaps around, and the things go better."

このとき,英文中のmeが出現する回数を数えるなら,

sample.rb
txt.scan("me").size

とするのが一般的だが,me以外の単語も数えてしまう.例えばnamesmeantに反応する.他にも色々と回避するために正規表現を付け加えると,行頭のmeme,に反応しなくなる不幸も起きる.

##\bの効用
そこで余計な単語を無視したいときに役立つのが\bである.\bは単語の境界を表す正規表現で,単語の区切り目を条件づけることができる.

sample2.rb
txt.scan(/\bme\b/).size

単語を取り出したいと感じたらとりあえず\b使っていこう!

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?