Splunkで役立つ正規表現(IPv4,IPv6,URI) / regular expressions of IPv4, IPv6 and URI
Happy Holidays!
As a small gift, I share about regular expressions that are sometimes useful in Splunk.
🎁🎄メリークリスマス!🎄🎁
🎁ささやかなプレゼントとして、Splunkで役立つ正規表現について共有します。🎁
- IPv4 address
- IPv6 address
- elements of URI
In what situations is it useful? In situations where field extraction settings are insufficient, the following macros and regular expressions can be used to check and analyze the IP addresses and URIs contained in various logs.
どんなときに役立つかと言うと、フィールド抽出設定が不十分な状況で、各種ログに含まれるIPアドレスやURIを確認・分析する際に下記マクロや正規表現を利用できます。
The macros introduced below can be used by adding your own definitions or by installing the Add-on below.
以下で紹介するマクロはご自身で定義を追加して使用することもできますし、下記Add-onをインストールすることでも利用できます。
Numeral system macros for Splunk
IPv4 Address
Splunk macro for regex of IPv4 addresses with an argument for fieldname
A Splunk macro that allows you to specify the name of the field from which to extract IPv4 addresses as an argument.
This can be used to extract IPv4 address fields with Splunk's rex command.
IPv4アドレスを抽出するフィールド名を引数で指定できるSplunk マクロ。
SplunkのrexコマンドでIPv4アドレスフィールドを抽出する場合に利用できます。
例えば 192.0.2.0
のような文字列にマッチします。.2.192.0.2.0
とは一致しません。
Definition of the Macro IP4Regex(1)
Name:
IP4Regex(1)
Definition:
"(?<![0-9.])(?<$field$>(?:(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9]|)[0-9])\.){3}(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9]|)[0-9]))(?![0-9.])"
Arguments:
field
Usage on your splunk
| makeresults
| eval _raw=split("oid=.1.3.6.1.2.1.4.20.1.2.192.0.2.16 don't match
telnet://192.0.2.16:80/ match","
")
| stats count by _raw
| fields - count
| rex `IP4Regex(included_ipv4)`
Raw regex for matching ipv4 addresses
This is a raw regular expression for IPv4 addresses, which matches 192.0.2.0
and so on. But it also partially matches strings like 192.0.2.0.1
.
IPv4アドレスとマッチする正規表現。例えば 192.0.2.0
のような文字列にマッチします。ただし、.2.192.0.2.0
のような文字列とも部分一致してしまいます。
(?:(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9]|)[0-9])\.){3}(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9]|)[0-9])
Regex for extracting fields and avoiding matches where there is no delimiter before and after an IPv4 address
For example, this doesn't match with .2.192.0.2.0
.
フィールドを抽出し、前後に区切り文字がない一致を回避するためのIPv4アドレスの正規表現。
例えば 192.0.2.0
のような文字列にマッチします。.2.192.0.2.0
とは一致しません。
(?<![0-9.])(?<included_ipv4>(?:(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9]|)[0-9])\.){3}(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9]|)[0-9]))(?![0-9.])
IPv6 Address
Splunk macro for regex of IPv6 addresses with an argument for fieldname
Regex for extracting fields and avoiding matches where there is no delimiter before and after an IPv6 address.
For example, this matches 2001:db8::b
but this doesn't match since2001:db8::broken
.
フィールドを抽出し、前後に区切り文字がない一致を回避するためのIPv6アドレスの正規表現。
例えば、2001:db8::b
とマッチしますが、since2001:db8::broken
とは一致しません。
Definition of the Macro IP6Regex(1)
Name:
IP6Regex(1)
Definition:
"(?i)(?<![0-9a-z:.])(?<$field$>(?:(?:(?:[0-9a-f]{1,4}:){7}([0-9a-f]{1,4}|:))|(?:(?:[0-9a-f]{1,4}:){6}(?::[0-9a-f]{1,4}|(?:(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9]|)[0-9])\.){3}(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9]|)[0-9])|:))|(?:(?:[0-9a-f]{1,4}:){5}(?:(?:(?::[0-9a-f]{1,4}){1,2})|:(?:(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9]|)[0-9])\.){3}(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9]|)[0-9])|:))|(?:(?:[0-9a-f]{1,4}:){4}(?:(?:(?::[0-9a-f]{1,4}){1,3})|(?:(?::[0-9a-f]{1,4})?:(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])(?:\.(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])){3}))|:))|(?:(?:[0-9a-f]{1,4}:){3}(?:(?:(?::[0-9a-f]{1,4}){1,4})|(?:(?::[0-9a-f]{1,4}){0,2}:(?:(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9]|)[0-9])\.){3}(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9]|)[0-9]))|:))|(?:(?:[0-9a-f]{1,4}:){2}(?:(?:(?::[0-9a-f]{1,4}){1,5})|(?:(?::[0-9a-f]{1,4}){0,3}:(?:(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9]|)[0-9])\.){3}(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9]|)[0-9]))|:))|(?:(?:[0-9a-f]{1,4}:){1}(?:(?:(?::[0-9a-f]{1,4}){1,6})|(?:(?::[0-9a-f]{1,4}){0,4}:(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])(?:\.(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])){3}))|:))|(?::(?:(?:(?::[0-9a-f]{1,4}){1,7})|(?:(?::[0-9a-f]{1,4}){0,5}:(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])(?:\.(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])){3}))|:)))(?:%[_a-z0-9]+)?)(?![0-9a-z:.])"
Arguments:
field
Usage on your splunk
| makeresults
| eval _raw=split("ldap://[2001:db8::b]/c=GB?objectClass?one match
since2001:db8::broken don't match","
")
| stats count by _raw
| fields - count
| rex `IP6Regex(included_ipv6)`
Raw regex for matching ipv6 addresses
This is a raw regular expression for IPv6 addresses, which will match 2001:db8::b
and so on. But it also partially matches strings like since2001:db8::broken
.
IPv6アドレスとマッチする正規表現。例えば、2001:db8::b
とマッチします。ただし、since2001:db8::broken
のような文字列とも部分一致してしまいます。
(?:(?:(?:[0-9a-f]{1,4}:){7}([0-9a-f]{1,4}|:))|(?:(?:[0-9a-f]{1,4}:){6}(?::[0-9a-f]{1,4}|(?:(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9]|)[0-9])\.){3}(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9]|)[0-9])|:))|(?:(?:[0-9a-f]{1,4}:){5}(?:(?:(?::[0-9a-f]{1,4}){1,2})|:(?:(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9]|)[0-9])\.){3}(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9]|)[0-9])|:))|(?:(?:[0-9a-f]{1,4}:){4}(?:(?:(?::[0-9a-f]{1,4}){1,3})|(?:(?::[0-9a-f]{1,4})?:(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])(?:\.(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])){3}))|:))|(?:(?:[0-9a-f]{1,4}:){3}(?:(?:(?::[0-9a-f]{1,4}){1,4})|(?:(?::[0-9a-f]{1,4}){0,2}:(?:(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9]|)[0-9])\.){3}(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9]|)[0-9]))|:))|(?:(?:[0-9a-f]{1,4}:){2}(?:(?:(?::[0-9a-f]{1,4}){1,5})|(?:(?::[0-9a-f]{1,4}){0,3}:(?:(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9]|)[0-9])\.){3}(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9]|)[0-9]))|:))|(?:(?:[0-9a-f]{1,4}:){1}(?:(?:(?::[0-9a-f]{1,4}){1,6})|(?:(?::[0-9a-f]{1,4}){0,4}:(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])(?:\.(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])){3}))|:))|(?::(?:(?:(?::[0-9a-f]{1,4}){1,7})|(?:(?::[0-9a-f]{1,4}){0,5}:(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])(?:\.(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])){3}))|:)))(?:%[_a-z0-9]+)?
In the regex of IP6Regex(1) macro described above is added (?<![0-9a-z:.])
at the head and (?![0-9a-z:.])
at the tail so that characters used for the IPv6 address or alphabet do not match the string that follows before and after.
上記のIP6Regex(1)マクロの正規表現では、先頭に (?<![0-9a-z:.])
、末尾に (?![0-9a-z:.])
を追加し、IPv6アドレスに使用される文字やアルファベットが前後に続く文字列とは一致しないようにしています。
URI
Definition of the Macro URIRegex(7)
Name:
URIRegex(7)
Definition:
"^(?i)(?:(?<$scheme$>[a-z][a-z0-9+\-.]*):)?(?:\/\/(?:(?<$userinfo$>(?:[a-z0-9\-._~]|(?:%[0-9a-f]{2})|[!$&'()*+,;=:])*)@)?(?<$host$>(?:\[(?:(?:(?:[0-9a-f]{1,4}:){7}([0-9a-f]{1,4}|:))|(?:(?:[0-9a-f]{1,4}:){6}(?::[0-9a-f]{1,4}|(?:(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9]|)[0-9])\.){3}(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9]|)[0-9])|:))|(?:(?:[0-9a-f]{1,4}:){5}(?:(?:(?::[0-9a-f]{1,4}){1,2})|:(?:(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9]|)[0-9])\.){3}(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9]|)[0-9])|:))|(?:(?:[0-9a-f]{1,4}:){4}(?:(?:(?::[0-9a-f]{1,4}){1,3})|(?:(?::[0-9a-f]{1,4})?:(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])(?:\.(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])){3}))|:))|(?:(?:[0-9a-f]{1,4}:){3}(?:(?:(?::[0-9a-f]{1,4}){1,4})|(?:(?::[0-9a-f]{1,4}){0,2}:(?:(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9]|)[0-9])\.){3}(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9]|)[0-9]))|:))|(?:(?:[0-9a-f]{1,4}:){2}(?:(?:(?::[0-9a-f]{1,4}){1,5})|(?:(?::[0-9a-f]{1,4}){0,3}:(?:(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9]|)[0-9])\.){3}(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9]|)[0-9]))|:))|(?:(?:[0-9a-f]{1,4}:){1}(?:(?:(?::[0-9a-f]{1,4}){1,6})|(?:(?::[0-9a-f]{1,4}){0,4}:(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])(?:\.(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])){3}))|:))|(?::(?:(?:(?::[0-9a-f]{1,4}){1,7})|(?:(?::[0-9a-f]{1,4}){0,5}:(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])(?:\.(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])){3}))|:)))(?:%[_a-z0-9]+)?\]|(?:[a-z0-9\-._~]|(?:%[0-9a-f]{2})|[!$&'()*+,;=])*))(?::(?<$port$>[0-9]*))?)?(?<$path$>(?:(?:\/(?:(?:(?:[a-z0-9._~-]|%[0-9a-f]{2}|[!$&'()*+,;=:@]))*))*)|(?:(?:\/(?:(?:(?:[a-z0-9._~-]|%[0-9a-f]{2}|[!$&'()*+,;=:@]))+)(?:\/(?:((?:[a-z0-9._~-]|%[0-9a-f]{2}|[!$&'()*+,;=:@]))*))+?))|(?:(?:(?:[a-z0-9._~-]|%[0-9a-f]{2}|[!$&'()*+,;=@])(?:\/(?:(?:(?:[a-z0-9._~-]|%[0-9a-f]{2}|[!$&'()*+,;=:@]))*))*))|(?:(?:(?:(?:(?:[a-z0-9._~-]|%[0-9a-f]{2}|[!$&'()*+,;=:@]))+)(?:\/(?:(?:(?:[a-z0-9._~-]|%[0-9a-f]{2}|[!$&'()*+,;=:@]))*))*))|(?:))(?:\?(?<$query$>(?:(?:[a-z0-9\-._~!$&'()*+,;=:@\/?]|(?:%[0-9a-f]{2})))*))?(?:#(?<$fragment$>(?:[a-z0-9\-._~!$&'()*+,;=:@\/?]|(?:%[0-9a-f]{2}))*))?(?![a-z0-9\-._~!$x&'()*+,;=:@\/?])"
Arguments:
scheme,userinfo,host,port,path,query,fragment
Usage on your splunk
| makeresults
| eval uri=split("ftp://ftp.example.net/rfc/rfc1808.txt
http://www.example.net/rfc/rfc2396.txt
https://jsmith@[2001:db8::1]:8080/rfc/rfc3986.html#3-5--Fragment
ldap://[2001:db8::7]/c=GB?objectClass?one
mailto:John.Doe@example.com
news:comp.infosystems.www.example.net
telnet://192.0.2.16:80/
https://cnn.example.com&story=breaking_news@192.0.2.16/top_story.htm","
")
| stats count by uri
| fields - count
| rex field=uri `URIRegex(scheme,userinfo,host,port,path,query,fragment)`
Regex for extracting fields from URI
This is a regular expression for extracting fields from URI.
This regular expression was designed with reference to RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax, but the host part matches only hostname, IPv4, and IPv6. (It does not match the syntax for future IP versions defined in RFC 3986, such as IPv7 and IPv8.)
URIからフィールドを抽出するための正規表現。
RFC 3986 - Uniform Resource Identifier (URI): Generic Syntaxを参考にこの正規表現を作成しましたがhostの部分はホスト名、IPv4、IPv6にのみマッチします。(IPv7やIPv8など、RFC 3986で定義された将来のIPバージョンの構文にはマッチしません。IPv6より後のIPバージョンは現状実用化の見込みがないため正規表現を省略しました)。
^(?i)(?:(?<scheme>[a-z][a-z0-9+\-.]*):)?(?:\/\/(?:(?<userinfo>(?:[a-z0-9\-._~]|(?:%[0-9a-f]{2})|[!$&'()*+,;=:])*)@)?(?<host>(?:\[(?:(?:(?:[0-9a-f]{1,4}:){7}([0-9a-f]{1,4}|:))|(?:(?:[0-9a-f]{1,4}:){6}(?::[0-9a-f]{1,4}|(?:(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9]|)[0-9])\.){3}(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9]|)[0-9])|:))|(?:(?:[0-9a-f]{1,4}:){5}(?:(?:(?::[0-9a-f]{1,4}){1,2})|:(?:(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9]|)[0-9])\.){3}(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9]|)[0-9])|:))|(?:(?:[0-9a-f]{1,4}:){4}(?:(?:(?::[0-9a-f]{1,4}){1,3})|(?:(?::[0-9a-f]{1,4})?:(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])(?:\.(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])){3}))|:))|(?:(?:[0-9a-f]{1,4}:){3}(?:(?:(?::[0-9a-f]{1,4}){1,4})|(?:(?::[0-9a-f]{1,4}){0,2}:(?:(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9]|)[0-9])\.){3}(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9]|)[0-9]))|:))|(?:(?:[0-9a-f]{1,4}:){2}(?:(?:(?::[0-9a-f]{1,4}){1,5})|(?:(?::[0-9a-f]{1,4}){0,3}:(?:(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9]|)[0-9])\.){3}(?:25[0-5]|(?:2[0-4]|1[0-9]|[1-9]|)[0-9]))|:))|(?:(?:[0-9a-f]{1,4}:){1}(?:(?:(?::[0-9a-f]{1,4}){1,6})|(?:(?::[0-9a-f]{1,4}){0,4}:(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])(?:\.(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])){3}))|:))|(?::(?:(?:(?::[0-9a-f]{1,4}){1,7})|(?:(?::[0-9a-f]{1,4}){0,5}:(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])(?:\.(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])){3}))|:)))(?:%[_a-z0-9]+)?\]|(?:[a-z0-9\-._~]|(?:%[0-9a-f]{2})|[!$&'()*+,;=])*))(?::(?<port>[0-9]*))?)?(?<path>(?:(?:\/(?:(?:(?:[a-z0-9._~-]|%[0-9a-f]{2}|[!$&'()*+,;=:@]))*))*)|(?:(?:\/(?:(?:(?:[a-z0-9._~-]|%[0-9a-f]{2}|[!$&'()*+,;=:@]))+)(?:\/(?:((?:[a-z0-9._~-]|%[0-9a-f]{2}|[!$&'()*+,;=:@]))*))+?))|(?:(?:(?:[a-z0-9._~-]|%[0-9a-f]{2}|[!$&'()*+,;=@])(?:\/(?:(?:(?:[a-z0-9._~-]|%[0-9a-f]{2}|[!$&'()*+,;=:@]))*))*))|(?:(?:(?:(?:(?:[a-z0-9._~-]|%[0-9a-f]{2}|[!$&'()*+,;=:@]))+)(?:\/(?:(?:(?:[a-z0-9._~-]|%[0-9a-f]{2}|[!$&'()*+,;=:@]))*))*))|(?:))(?:\?(?<query>(?:(?:[a-z0-9\-._~!$&'()*+,;=:@\/?]|(?:%[0-9a-f]{2})))*))?(?:#(?<fragment>(?:[a-z0-9\-._~!$&'()*+,;=:@\/?]|(?:%[0-9a-f]{2}))*))?(?![a-z0-9\-._~!$x&'()*+,;=:@\/?])