LoginSignup
0
0

More than 1 year has passed since last update.

grep 正規表現 逆引き

Last updated at Posted at 2022-06-26

前提

  • -E オプションを付けることで、正規表現の拡張形式(拡張正規表現)が有効になる
  • 拡張正規表現で利用される文字は、? + { | ( ) である
  • このドキュメントでは、拡張正規表現の文字が含まれる場合、egrep コマンドで実行している(grep -E と同じ)
  • このドキュメントでは、--color=always オプションを付けて、マッチした箇所が分かるようにしている

利用するテキスト

computerScienceBooks.txt
1.      ACM/IEEE International Conference on Software Engineering
2.      Journal of Systems and Software
3.      IEEE Transactions on Software Engineering
4.      Information and Software Technology
5.      ACM SIGSOFT Iinternational Symposium on Foundations of Software Engineering
6.      empirical software engineering
7.      IEEE Software
8.      ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
9.      Mining Software Repositories
10.     IEEE/ACM International Conference on Automated Software Engineering (ASE)

メタ文字

改行文字以外の任意の1文字にマッチさせたい「.」
「.」
$ grep --color=always 'I.n' computerScienceBooks.txt
5.	ACM SIGSOFT Iinternational Symposium on Foundations of Software Engineering

マッチした箇所:
5. ACM SIGSOFT Iinternational Symposium on Foundations of Software Engineering

補足:Iとnの間に、必ず何らかの1文字が入らないとマッチしない。

直前の文字の0回もしくは1回の繰り返しにマッチさせたい「?」
「?」
$ egrep --color=always 'I.?n' computerScienceBooks.txt
1.	ACM/IEEE International Conference on Software Engineering
4.	Information and Software Technology
5.	ACM SIGSOFT Iinternational Symposium on Foundations of Software Engineering
10.	IEEE/ACM International Conference on Automated Software Engineering (ASE)

マッチした箇所:

  1. ACM/IEEE International Conference on Software Engineering
  2. Information and Software Technology
  3. ACM SIGSOFT Iinternational Symposium on Foundations of Software Engineering
  4. IEEE/ACM International Conference on Automated Software Engineering (ASE)

補足:この例の正規表現は、「I」から始まり「n」で終わる任意の3文字、もしくは「In」という文字にマッチする。

直前の文字の0回もしくは1回以上の繰り返しにマッチさせたい「*」
「*」
$ grep --color=always 'S.*o' computerScienceBooks.txt
1.	ACM/IEEE International Conference on Software Engineering
2.	Journal of Systems and Software
3.	IEEE Transactions on Software Engineering
4.	Information and Software Technology
5.	ACM SIGSOFT Iinternational Symposium on Foundations of Software Engineering
7.	IEEE Software
8.	ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
9.	Mining Software Repositories
10.	IEEE/ACM International Conference on Automated Software Engineering (ASE)

マッチした箇所:
1 ACM/IEEE International Conference on Software Engineering
2 Journal of Systems and Software
3 IEEE Transactions on Software Engineering
4 Information and Software Technology
5 ACM SIGSOFT Iinternational Symposium on Foundations of Software Engineering
7 IEEE Software
8 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
9 Mining Software Repositories
10 IEEE/ACM International Conference on Automated Software Engineering (ASE)

補足:この例の正規表現は、「S」から始まり「o」で終わる文字にマッチする。

直前の文字の1回以上の繰り返しにマッチさせたい「+」
「+」
$ egrep --color=always 'S.+o' computerScienceBooks.txt
2.	Journal of Systems and Software
4.	Information and Software Technology
5.	ACM SIGSOFT Iinternational Symposium on Foundations of Software Engineering
8.	ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
9.	Mining Software Repositories

マッチした箇所:
2. Journal of Systems and Software
4. Information and Software Technology
5. ACM SIGSOFT Iinternational Symposium on Foundations of Software Engineering
8. ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
9. Mining Software Repositories

補足:この例の正規表現は、「S」から始まり「o」で終わる3文字以上の文字にマッチする。

グループ

グループ内の何れかの文字列にマッチさせたい「()」
「()」
$ egrep --color=always '(IEEE|ACM) S' computerScienceBooks.txt
5.	ACM SIGSOFT Iinternational Symposium on Foundations of Software Engineering
7.	IEEE Software
8.	ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)

マッチした箇所:
5. ACM SIGSOFT Iinternational Symposium on Foundations of Software Engineering
7. IEEE Software
8. ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)

補足:この例の正規表現は、「IEEE S」もしくは「ACM S」にマッチする。

文字クラス

数値にマッチさせたい「-P '\d'」
「-P '\d'」
$ grep --color=always -P '\d' computerScienceBooks.txt
1.	ACM/IEEE International Conference on Software Engineering
2.	Journal of Systems and Software
3.	IEEE Transactions on Software Engineering
4.	Information and Software Technology
5.	ACM SIGSOFT Iinternational Symposium on Foundations of Software Engineering
6.	empirical software engineering
7.	IEEE Software
8.	ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
9.	Mining Software Repositories
10.	IEEE/ACM International Conference on Automated Software Engineering (ASE)

マッチした箇所:
1. ACM/IEEE International Conference on Software Engineering
2. Journal of Systems and Software
3. IEEE Transactions on Software Engineering
4. Information and Software Technology
5. ACM SIGSOFT Iinternational Symposium on Foundations of Software Engineering
6. empirical software engineering
7. IEEE Software
8. ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
9. Mining Software Repositories
10. IEEE/ACM International Conference on Automated Software Engineering (ASE)

数値以外にマッチさせたい「-P '\D'」
「-P '\D'」
$ grep --color=always -P '\D' computerScienceBooks.txt
1.	ACM/IEEE International Conference on Software Engineering
2.	Journal of Systems and Software
3.	IEEE Transactions on Software Engineering
4.	Information and Software Technology
5.	ACM SIGSOFT Iinternational Symposium on Foundations of Software Engineering
6.	empirical software engineering
7.	IEEE Software
8.	ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
9.	Mining Software Repositories
10.	IEEE/ACM International Conference on Automated Software Engineering (ASE)

マッチした箇所:
1. ACM/IEEE International Conference on Software Engineering
2. Journal of Systems and Software
3. IEEE Transactions on Software Engineering
4. Information and Software Technology
5. ACM SIGSOFT Iinternational Symposium on Foundations of Software Engineering
6. empirical software engineering
7. IEEE Software
8. ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
9. Mining Software Repositories
10. IEEE/ACM International Conference on Automated Software Engineering (ASE)

小文字にマッチさせたい「[:lower:]」
「[:lower:]」
$ grep --color=always 'I[[:lower:]]' computerScienceBooks.txt
1.	ACM/IEEE International Conference on Software Engineering
4.	Information and Software Technology
5.	ACM SIGSOFT Iinternational Symposium on Foundations of Software Engineering
8.	ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
10.	IEEE/ACM International Conference on Automated Software Engineering (ASE)

マッチした箇所:
1 ACM/IEEE International Conference on Software Engineering
4 Information and Software Technology
5 ACM SIGSOFT Iinternational Symposium on Foundations of Software Engineering
8 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
10 IEEE/ACM International Conference on Automated Software Engineering (ASE)

大文字にマッチさせたい「[:upper:]」
「[:upper:]」
$ grep --color=always 'I[[:upper:]]' computerScienceBooks.txt
1.	ACM/IEEE International Conference on Software Engineering
3.	IEEE Transactions on Software Engineering
5.	ACM SIGSOFT Iinternational Symposium on Foundations of Software Engineering
7.	IEEE Software
8.	ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
10.	IEEE/ACM International Conference on Automated Software Engineering (ASE)

マッチした箇所:
1 ACM/IEEE International Conference on Software Engineering
3 IEEE Transactions on Software Engineering
5 ACM SIGSOFT Iinternational Symposium on Foundations of Software Engineering
7 IEEE Software
8 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
10 IEEE/ACM International Conference on Automated Software Engineering (ASE)

単語境界

スペースに囲まれた特定の文字をマッチさせたい「\b」
「\b」
$ grep --color=always -P '\bo[n|f]\b' computerScienceBooks.txt
1.	ACM/IEEE International Conference on Software Engineering
2.	Journal of Systems and Software
3.	IEEE Transactions on Software Engineering
5.	ACM SIGSOFT Iinternational Symposium on Foundations of Software Engineering
8.	ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
10.	IEEE/ACM International Conference on Automated Software Engineering (ASE)

マッチした箇所:
1 ACM/IEEE International Conference on Software Engineering
2 Journal of Systems and Software
3 IEEE Transactions on Software Engineering
5 ACM SIGSOFT Iinternational Symposium on Foundations of Software Engineering
8 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
10 IEEE/ACM International Conference on Automated Software Engineering (ASE)

量指定子(quantifier)

連続した同じ文字にマッチさせたい「{}」
「{}」
$ egrep --color=always 'E{3}' computerScienceBooks.txt
1.	ACM/IEEE International Conference on Software Engineering
3.	IEEE Transactions on Software Engineering
7.	IEEE Software
10.	IEEE/ACM International Conference on Automated Software Engineering (ASE)

マッチした箇所:
1 ACM/IEEE International Conference on Software Engineering
3 IEEE Transactions on Software Engineering
7 IEEE Software
10 IEEE/ACM International Conference on Automated Software Engineering (ASE)

アンカー

行の先頭が特定の文字で始まる文字列にマッチさせたい「^」
「^」
$ grep --color=always '^[4-6]' computerScienceBooks.txt
4.	Information and Software Technology
5.	ACM SIGSOFT Iinternational Symposium on Foundations of Software Engineering
6.	empirical software engineering

マッチした箇所:
4. Information and Software Technology
5. ACM SIGSOFT Iinternational Symposium on Foundations of Software Engineering
6. empirical software engineering

行の末尾が特定の文字で終わる文字列にマッチさせたい「$」
「$」
$ grep --color=always '[e|g]$' computerScienceBooks.txt
1.	ACM/IEEE International Conference on Software Engineering
2.	Journal of Systems and Software
3.	IEEE Transactions on Software Engineering
5.	ACM SIGSOFT Iinternational Symposium on Foundations of Software Engineering
6.	empirical software engineering
7.	IEEE Software

マッチした箇所:
1 ACM/IEEE International Conference on Software Engineering
2 Journal of Systems and Software
3 IEEE Transactions on Software Engineering
5 ACM SIGSOFT Iinternational Symposium on Foundations of Software Engineering
6 empirical software engineering
7 IEEE Software

後方参照

HTMLタグにマッチさせたい「\1」

利用するテキスト

htmltags.txt
1 This is comment
2 <head>some head</head>
3 <body>some body</body>
4 Hello
5 <p>OK</p>
「\1」
$ egrep --color=always '<([a-zA-Z]*)>.*</\1>' htmltags.txt
2 <head>some head</head>
3 <body>somebody</body>
5 <p>OK</p>

補足:HTML開始タグと終了タグが1行に書かれているケースは少なく、実用するためにはより洗練されたものでなければならない。

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0