Posted at

Regular Expressions

More than 3 years have passed since last update.

A regular expression (regex or regexp for short) is a special text string for describing a search pattern. You can think of regular expressions as wildcards on steroids. You are probably familiar with wildcard notations such as .txt to find all text files in a file manager. The regex equivalent is ..txt$ . This can be used to define a pattern mainly for searching files and text.

You could go and test for yourself on this site

just put the sample text or words and the regular expression to use


\ (backslash) :arrow_right: this is to escape(or to make the next character to it to be used) the character next to it. Some characters are used in creating regular expresion patterns which will be explained below

+ (plus sign) :arrow_right: this is to indicate if the preceding character occurs 1 or more times, for example the word "mmmmm" can be found using regex "m+" instead of using "mmmmm"

* (asterisk sign) :arrow_right: this is almost the same as the plus (+) sign except that it allows for 0 or more. In the word "hello" using the regex "m*" will still return true

[ ] (range) :arrow_right: range is for defining a range of allowable characters, using [0-9] would mean to find any number from 0 to 9.

This can also be used for letters and other characters. "[0-9a-zA-Z]" will find any letter or number

^ (caret sign) :arrow_right: has 2 uses,first: a caret sign inside a range [] will negate the succeeding pattern or character . [^0-8]+ means 1 or more 9s ,for example "99999" is allowed (since we negated any number from 0 to 8. Second, if a caret is not used inside a range or bracket, it would mean, to only look at the beginning of the target string . For example

using "^hi" in the string "oh hi" will not be valid, because our string started with "o".

$ (dollar sign) :arrow_right: is used to indicate the ending of the string your are looking for (opposite of the 2nd use of ^). "com$" would mean to find a word ending in "com"

{n} :arrow_right: (where n is a number) Matches when the preceding character, or character range, occurs n times exactly, for example, to find a local phone number we could use [0-9]{3}-[0-9]{4} which would find any number of the form 123-4567. Value is enclosed in braces (curly brackets).

{n,m} :arrow_right: (m is a number also) Matches when the preceding character occurs at least n times but not more than m times, for example, ba{2,3}b will find baab and baaab but NOT bab or baaaab. Values are enclosed in braces (curly brackets).

{n,} :arrow_right: Matches when the preceding character occurs at least n times, for example, ba{2,}b will find 'baab', 'baaab' or 'baaaab' but NOT 'bab'. Values are enclosed in braces (curly brackets).

( ) :arrow_right: parentheses are used to group patterns :wink:

| :arrow_right: the Vertical pipe is used for alternation. Using parenthesis and pipe together we can form the regex "gr(a|e)y". This regex will find words containing "grey" or "gray" since it alternate between "a" and "e".

Some examples for use:

say you want to find something in a file(s) ,using regular expression would help you find it not just the exact word, but you can use a pattern

like for finding emails, you cant just search for "@" because that is so broad and email domains are too many.

:laughing: "m" - the basic search, this regular expression will find all with letter "m". Wether it is beside any character or not as long as it find a letter "m"

:yum: "hi" - This will find words hide, hidden, hi

"+" - using the backslash, we mean to say to find the character "+".

"[0-9]" - any number

"[0-9]+[a-z]" - find words containing atleast 1 number and a letter(lowercase)

"[0-9]+[a-z]+" - find words containing atleast 1 number and atleast 1 letter (lowercase) . "123hello" is a valid example for this

finding emails:

"^(([a-zA-Z]|[0-9])|([-]|[_]|[.]))+@{2,63}.+$" - this is coming from the website above :grin:

notice that it used ^ in the beginning, meaning, accept a string starting from letters a-z ,lower and upper case or a number, or characters -,_,. and then followed by "@".

finding home #, mobile #:

"^([0-9]{3,})[-]([0-9]{4})\$" this is for home phone #s like "655-7663" (with the '-' dash sign)

"^0[0-9]{10}\$" - this is for find cellphone numbers (pure numbers only like "09055741020"

"^+63[0-9]{10}$" - this is also for finding cell #s in the format of country code + number like "+639055741020"