Detailed Explanation of JavaScript Regular Expressions

Introduction

Simply put, regular expressions are a powerful tool for pattern matching and replacement. Their functions are as follows:

Test a string for a specific pattern. For example, you can test an input string to see if it contains a phone number pattern or a credit card number pattern. This is called data validation.

Replace text. You can use a regular expression in a document to identify specific text, and then delete it entirely or replace it with other text.

Extracting a substring from a string based on pattern matching. This can be used to find specific text in text or input fields.

Basic syntax

After gaining a basic understanding of the functions and uses of regular expressions, let's take a closer look at the syntax of regular expressions.

The general form of a regular expression is as follows:

In the regular expression `/love/`, the part within the "/" delimiters is the pattern to be matched against in the target object. Users simply place the pattern they want to match within the "/" delimiters. To allow users more flexibility in customizing the pattern, regular expressions provide special "metacharacters." Metacharacters are dedicated characters in regular expressions that have special meanings and can be used to specify the appearance pattern of their preceding character (the character before the metacharacter) in the target object.

Commonly used metacharacters include: "+", "*", and "?".

The "+" metacharacter requires that its preceding character must appear consecutively once or multiple times in the target object.

The “*” metacharacter specifies that its preceding character must appear zero times or multiple times consecutively in the target object.

The “?” metacharacter requires that its preceding object must appear zero or one consecutively in the target object.

Let's take a look at the specific applications of regular expression metacharacters.

/fo+/ Because the regular expression above contains the "+" metacharacter, it can match strings in the target object such as "fool", "fo", or "football" that have one or more consecutive "o"s after the letter "f".

/eg*/ Because the regular expression above contains the metacharacter "*", it can match strings in the target object such as "easy", "ego", or "egg" that have zero or more consecutive "g"s after the letter "e".

/Wil?/ because the regular expression above contains the "?" metacharacter, which means it can match strings in the target object such as "Win" or "Wilson" that have zero or one consecutive "l" after the letter "i".

Sometimes it's unclear how many characters to match. To accommodate this uncertainty, regular expressions support the concept of quantifiers. These quantifiers specify how many times a given component of a regular expression must appear to satisfy a match.

{n} n is a non-negative integer. Matches exactly n times. For example, 'o{2}' cannot match the 'o' in "Bob", but it can match the two 'o's in "food".

{n,} where n is a non-negative integer. It matches at least n times. For example, 'o{2,}' cannot match the 'o' in "Bob", but it can match all the 'o's in "foooood". 'o{1,}' is equivalent to 'o+'. 'o{0,}' is equivalent to 'o*'.

{n,m} where m and n are both non-negative integers, and n <= m. The match will occur at least n times and at most m times. For example, “o{1,3}” will match the first three 'o's in “fooooood”. ’o{0,1}’ is equivalent to ‘o?’. Note that there should be no spaces between the comma and the two numbers.

In addition to metacharacters, users can also specify the exact frequency of a pattern in the matched object. For example, the regular expression `/jim {2,6}/` specifies that the character 'm' can appear 2-6 times consecutively in the matched object. Therefore, this regular expression can match strings such as `jimmy` or `jimmmmmy`.

After gaining a basic understanding of how to use regular expressions, let's look at how to use some other important metacharacters.

\s: Used to match a single space character, including tabs and newlines;

\S: Matches all characters except a single space character;

\d: Used to match numbers from 0 to 9;

\w: Used to match letters, numbers, or underscore characters;

\W: Used to match all characters that do not match \w;

The dot (.) is used to match all characters except newline characters.

(Note: We can consider \s and \S, as well as \w and \W, as inverse operations.)

Below, we will look at an example to see how to use the above metacharacters in regular expressions.

The regular expression `/\s+/` can be used to match one or more whitespace characters in a target object.

If we have a complex financial statement, we can easily find all items totaling thousands of yuan using the regular expression above.

In addition to the metacharacters we've introduced above, regular expressions also have another unique type of special characters: anchors. Anchors are used to specify the position of the matching pattern within the target object. Commonly used anchors include: "^", "$", "\b", and "\B".

The "^" anchor specifies that the matching pattern must appear at the beginning of the target string.

The $ symbol specifies that the matching pattern must appear at the end of the target object.

The "\b" anchor specifies that the matching pattern must appear at one of the two boundaries at the beginning or end of the target string.

The "\B" anchor specifies that the matched object must be located within the two boundaries of the beginning and end of the target string.

That is, the matching object cannot be the beginning or the end of the target string.

Similarly, we can consider "^" and "$", as well as "\b" and "\B", as two sets of anchors that are inverse operations of each other. For example: /^hell/ Because the above regular expression contains the "^" anchor, it can match strings in the target object that begin with "hell", "hello", or "hellhound". /ar$/ Because the above regular expression contains the "$" anchor, it can match strings in the target object that end with "car", "bar", or "ar". /\bbom/ Because the above regular expression pattern begins with the "\b" anchor, it can match strings in the target object that begin with "bomb" or "bom". /man\b/ Because the above regular expression pattern ends with the "\b" anchor, it can match strings in the target object that end with "human", "woman", or "man".

To allow users greater flexibility in setting matching patterns, regular expressions allow users to specify a range of characters within the matching pattern, rather than being limited to specific characters. For example:

The regular expression /[A-Z]/ will match any uppercase letter in the range A to Z.

The regular expression /[a-z]/ will match any lowercase letter in the range from a to z.

The regular expression /[0-9]/ will match any number in the range from 0 to 9.

The regular expression /([a-z][A-Z][0-9])+/ will match any string consisting of letters and numbers, such as "aB0".

One point users should note here is that you can use parentheses "()" to group strings together in regular expressions. The elements enclosed in the parentheses must both appear in the target string. Therefore, the regular expression above will not match strings like "abc" because the last character in "abc" is a letter, not a number.

If we want to achieve a similar "OR" operation in regular expressions to matching any one of several different patterns, we can use the pipe symbol "|". For example: /to|too|2/ The above regular expression will match "to", "too", or "2" in the target object.

Another commonly used operator in regular expressions is the negation operator "[^]". Unlike the anchor "^" introduced earlier, the negation operator "[^]" specifies that the target object cannot contain the string specified in the pattern. For example: /[^A-C]/ The above string will match any character in the target object except A, B, and C. Generally speaking, when "^" appears inside "[]", it is regarded as a negation operator; when "^" is outside "[]", or when there is no "[]", it should be regarded as an anchor.

Finally, when users need to add metacharacters to a regular expression pattern and search for its matching objects, they can use the escape character "\". For example: /Th\*/ The above regular expression will match "Th*" in the target object instead of "The" etc.

After constructing a regular expression, it can be evaluated like a mathematical expression; that is, it can be evaluated from left to right according to a priority order. The priority is as follows:

1. \ Escape character

2. Parentheses and square brackets: (), (?:), (?=), []

3. Quantifiers: *, +, ?, {n}, {n,}, {n,m}

4. Position and order of ^, $, and \anymeta characters

5. | "OR" operation

This websiteOriginal articleAll follow "Attribution-NonCommercial-ShareAlike 4.0 License (CC BY-NC-SA 4.0)Please retain the following annotations when sharing or adapting:

Original author:Jake Tao,source:"Detailed Explanation of JavaScript Regular Expressions"

About the author

Jake Tao

Post a reply