Regular expressions are a powerful tool for matching strings. JavaScript provides convenient methods for searching and replacing strings using regular expressions patterns, highly useful for validating and manipulating user entered text.
A regular expression is a specific kind of string used to search and manipulate textual content based on patterns. Often referred to as regex or regexp, a regular expression or pattern is an expression that describes a set of strings. Thus, we refer to pattern matching as the process of finding specific sets of strings described by regular expressions.
Regular expressions are written in a formal language interpreted by a regular expressions engine. The language provides a concise way to describe sets of strings through a combination of normal characters and metacharacters. Normal characters are treated as literals that have no special meaning and only match themselves. On the other hand, metacharacters or metasequences are characters or sequences of characters that are interpreted in a special way and represent things such as quantity, location, types and ranges of characters. If metadata are data about data, then metacharacters are data about characters.
Most programming languages support regular expressions one way or the other and in some languages such as Perl they are built into their syntax. JavaScript is no exception and has built-in support for regular expressions since version 1.2. It uses a Perl-like syntax.
In JavaScript, there are three optional flags that allow you to change how the regular expressions engine will perform the actual matching:
gim^ and $ match next to \n instead of the start or end of the entire string..at matches bat, cat, rat and also .at, 1at[^\x0A\x0D\u2028\u2029][...][abc] matches a, b and/or c, in any order.-, for example [a-z] matches any lowercase ASCII letter from a to z. Other examples include [A-F] which matches any uppercase ASCII letter from A to F, and [4-7] which matches any number from 4 to 7. The - character is treated as a literal character if it's listed first, last or escaped: [-] matches -, [a-] matches a and/or -, [a\-z] matches a, - and/or z.[0-9a-fA-F] matches any number and also letters from a to z irrespective of their case, [02468aeiouy-] matches even numbers, vowels and the - character.[\[\]] matches [ and/or ]. The [ doesn't need to be escaped if it's listed first: [[] matches [[^...]^ negates the expression. Example: [0-9] matches any character that's not a number.^ character is a special character, it doesn't need to be escaped within the brackets in order to be treated as a literal. Example: [^] matches anything, [^^] matches anything except the ^ character.\w[A-Za-z0-9_]\W[^A-Za-z0-9_]\d[0-9]\D[^0-9]\s[\f\n\r\t\v\u00A0\u2028\u2029] (\u00A0 means "no-break space", \u2028 means "line separator", \u2029 means "paragraph separator")\S[^\f\n\r\t\v\u00A0\u2028\u2029]\b\x08)\f\x0C)\n\x0A)\r\x0D)\t\x09)\v\x0B)\0\x00)\xhhhh.\uhhhhhhhh.Repetition is specified by quantifiers:
?ab? matches a and ab*ab* matches a, ab, abb, abbb etc.+ab+ matches ab, abb, abbb etc.{n}n times. Example: ab{2} matches abb{n,}n or more times. Example: ab{2,} matches abb, abbb, abbbb etc.{n,m}n times, but no more than m times. Example: ab{2,3} matches abb and abbb??ab?? against abbbbb matches a*?ab*? against abbbbb matches a+?ab+? against abbbbb matches ab{n}?n or more times, but as few times as possible. Example: ab{2}? against abbbbb matches abb{n,m}?n times, no more than m times, but as few times as possible. Example: ab{2,3}? against abbbbb matches abb(...)(foo)bar matches foobar and captures foo(?:...)(?:foo)bar matches foobar and doesn't capture anything...|...foo|bar|baz matches either foo, bar or bazAnchors match positions in the subject string:
^m. Example: ^The matches The at the beginning of every line$m. Example: !$ matches ! at the end of every line\b\bipsum matches ipsum against lorem ipsum but doesn't match anything against lipsum. Word boundaries need not be spaces. For example: \w+\b matches both yeah and whatever against yeah, whatever!\B\Bipsum matches ipsum against lipsum but doesn't match anything against lorem ipsum(?=...)ab(?=c) matches ab against abc but doesn't match anything against ab or aba(?!...)ab(?!c) matches ab against ab or aba but doesn't match anything against abcNote: lookbehind is not supported in JavaScript
JavaScript provides two objects for dealing with regular expressions:
RegExp is a global object in JavaScript used to create regular expressions objects. A RegExp object can be defined through an object constructor:
new RegExp(pattern [, flags])
or as a literal:
/pattern/flags
The advantage of the constructor function is that the pattern can be constructed dynamically at any time.
They are handled the same way, no matter how you define them.
patternflagsThe RegExp objects provides two methods for working with regular expressions:
test(text)test(text) method tests for a match in the input string. It searches the string for the specified pattern and returns true if the pattern matches the string or false otherwise.exec(text)exec(text) method executes the specified pattern on the input string and returns an array of matched strings if it succeeds or null if it fails. The first element of the array contains the text matched by the entire pattern while the other elements correspond to text that matched captured subpatterns.The String global object may also be used to search and manipulate strings in JavaScript. It offers four methods for matching and manipulating strings.
str.match(pattern)pattern against the input string. With the g (global search) flag it returns an array containing all matches. Without the g flag it returns only the first match. If there are no matches it returns null.str.search(pattern)str.replace(pattern, replacement)replacement string. Returns the nre string. The subject string may remain unchanged if there are no matches.str.split(pattern [, limit])limit specifies a limit on the number of splits.If you see a typo, want to make a suggestion or have anything in particular you'd like to know more about, please drop us an e-mail at hello at diveintojavascript dot com.
Copyright © 2010-2011 Dive Into JavaScript