regexpPattern

Pattern that matches specified regular expression

Since R2020b

collapse all in page

Syntax

pat = regexpPattern(expression)

pat = regexpPattern(expression,Name,Value)

Description

pat = regexpPattern(expression) creates a pattern that matches the regular expression.

example

pat = regexpPattern(expression,Name,Value) specifies additional options with one or more name-value pair arguments. For example, you can specify 'IgnoreCase' as true to ignore case when matching..

example

Examples

collapse all

Combine Patterns and Regular Expressions

Open Live Script

Use regexpPattern to specify patterns using regular expressions that can be used as inputs for text-searching functions.

Find words that start with c, end with t, and contain one or more vowels in between.

txt = "bat cat can car coat court CUT ct CAT-scan";
expression = 'c[aeiou]+t';

The regular expression 'c[aeiou]+t' specifies this pattern:

c must be the first character.
c must be followed by one of the characters inside the brackets, [aeiou].
The bracketed pattern must occur one or more times, as indicated by the + operator.
t must be the last character, with no characters between the bracketed pattern and the t.

Extract the pattern. Note, the words CUT and CAT do not match because they are uppercase.

pat = regexpPattern(expression);
extract(txt,pat)

ans = 2x1 string
    "cat"
    "coat"

Patterns created using regexpPattern can be combined with other pattern functions to create more complicated patterns. Use whitespacePattern and lettersPattern to create a new pattern that also matches words after the regular expression matches, and then extract the new pattern.

pat = regexpPattern(expression) + whitespacePattern + lettersPattern;
extract(txt,pat)

ans = 2x1 string
    "cat can"
    "coat court"

Ignore `newline` Characters

Open Live Script

Create a string containing a newline character. Use the regular expression '.' to match any character except newline characters.

txt = "First Line" + newline + "Second Line"

txt = 
    "First Line
     Second Line"

expression = '.+';

The regular expression '.+' matches one or more of any character including newline characters. Count how many times the pattern matches.

pat = regexpPattern(expression);
count(txt,pat)

ans = 
1

Create a new regular expression pattern, but this time specify DotExceptNewline as true so that the pattern does not match newline characters. Count how many times the pattern matches.

pat = regexpPattern(expression,"DotExceptNewline",true);
count(txt,pat)

ans = 
2

Ignore Whitespaces in Expressions When Matching

Open Live Script

Create txt as a string.

txt = "Hello World";

The expression '. *' only matches individual characters because of the whitespace between . and *. Create a pattern to match the regular expression '. *', and then extract the pattern.

expression = '. *';
pat = regexpPattern(expression);
extract(txt,pat)

ans = 10x1 string
    "H"
    "e"
    "l"
    "l"
    "o "
    "W"
    "o"
    "r"
    "l"
    "d"

Create a new regular expression pattern, but this time specify FreeSpacing as true to ignore whitespaces in the regular expression. Extract the new pattern.

pat = regexpPattern(expression,"FreeSpacing",true);
extract(txt,pat)

ans = 
"Hello World"

Ignore Case with Regular Expressions

Open Live Script

Find words that start with c, end with t, and contain one or more vowels in between, regardless of case.

txt = "bat cat can car coat court CUT ct CAT-scan";
expression = 'c[aeiou]+t';

The regular expression 'c[aeiou]+t' specifies this pattern:

c must be the first character.
c must be followed by one of the characters inside the brackets, [aeiou].
The bracketed pattern must occur one or more times, as indicated by the + operator.
t must be the last character, with no characters between the bracketed pattern and the t.

Extract the pattern. Note that the words CUT and CAT do not match because they are uppercase.

pat = regexpPattern(expression);
extract(txt,pat)

ans = 2x1 string
    "cat"
    "coat"

Create a new regular expression pattern, but this time specify IgnoreCase as true to ignore case with the regular expression. Extract the new pattern.

pat = regexpPattern(expression,"IgnoreCase",true);
extract(txt,pat)

ans = 4x1 string
    "cat"
    "coat"
    "CUT"
    "CAT"

Designate `^` and `$` Anchors as Line or Text Anchors

Open Live Script

The metacharacters ^ and $ can be used to specify line anchors or text anchors. The behavior that regexpPattern uses is specified by the Anchors option.

Create txt as a string containing newline characters.

txt = "cat" + newline + "bat" + newline + "rat";

The regular expression '^.+?$' matches one or more characters between two anchors. Create a pattern for this regular expression, and specify Anchors as “text” so that the ^ and $ anchors are treated as text anchors. Extract the pattern.

expression = '^.+?$';
pat = regexpPattern(expression,"Anchors","text");
extract(txt,pat)

ans = 
    "cat
     bat
     rat"

Create a new regular expression pattern, but this time specify Anchors as “line” so that the ^ and $ anchors are treated as line anchors. Extract the new pattern.

pat = regexpPattern(expression,"Anchors","line");
extract(txt,pat)

ans = 3x1 string
    "cat"
    "bat"
    "rat"

Input Arguments

collapse all

`expression` — Regular expression
character vector | cell array of character vectors | string array

Regular expression, specified as a character vector, a cell array of character vectors, or a string array. Each expression can contain characters, metacharacters, operators, tokens, and flags that specify patterns to match in str.

The following tables describe the elements of regular expressions.

Metacharacters

Metacharacters represent letters, letter ranges, digits, and space characters. Use them to construct a generalized pattern of characters.

Metacharacter	Description	Example
`.`	Any single character, including white space	`'..ain'` matches sequences of five consecutive characters that end with `'ain'`.
`[c₁c₂c₃]`	Any character contained within the square brackets. The following characters are treated literally: `$ \| . * + ?` and `-` when not used to indicate a range.	`'[rp.]ain'` matches `'rain'` or `'pain'` or `'.ain'`.
`[^c₁c₂c₃]`	Any character not contained within the square brackets. The following characters are treated literally: `$ \| . * + ?` and `-` when not used to indicate a range.	`'[^rp]ain'` matches all four-letter sequences that end in `'ain'`, except `'rain'` and `'pain'` and `'ain'`. For example, it matches `'gain'`, `'lain'`, or `'vain'`.
`[c`₁`-c`₂`]`	Any character in the range of `c`₁ through `c`₂	`'[A-G]'` matches a single character in the range of `A` through `G`.
`\w`	Any alphabetic, numeric, or underscore character. For English character sets, `\w` is equivalent to `[a-zA-Z_0-9]`	`'\w*'` identifies a word comprised of any grouping of alphabetic, numeric, or underscore characters.
`\W`	Any character that is not alphabetic, numeric, or underscore. For English character sets, `\W` is equivalent to `[^a-zA-Z_0-9]`	`'\W*'` identifies a term that is not a word comprised of any grouping of alphabetic, numeric, or underscore characters.
`\s`	Any white-space character; equivalent to `[ \f\n\r\t\v]`	`'\w*n\s'` matches words that end with the letter `n`, followed by a white-space character.
`\S`	Any non-white-space character; equivalent to `[^ \f\n\r\t\v]`	`'\d\S'` matches a numeric digit followed by any non-white-space character.
`\d`	Any numeric digit; equivalent to `[0-9]`	`'\d*'` matches any number of consecutive digits.
`\D`	Any nondigit character; equivalent to `[^0-9]`	`'\w*\D\>'` matches words that do not end with a numeric digit.
`\oN` or `\o{N}`	Character of octal value `N`	`'\o{40}'` matches the space character, defined by octal `40`.
`\xN` or `\x{N}`	Character of hexadecimal value `N`	`'\x2C'` matches the comma character, defined by hex `2C`.

Character Representation

Operator	Description
`\a`	Alarm (beep)
`\b`	Backspace
`\f`	Form feed
`\n`	New line
`\r`	Carriage return
`\t`	Horizontal tab
`\v`	Vertical tab
`\char`	Any character with special meaning in regular expressions that you want to match literally (for example, use `\\` to match a single backslash)

Quantifiers

Quantifiers specify the number of times a pattern must occur in the matching text. expr represents any regular expression.

Quantifier	Number of Times Expression Occurs	Example
`expr*`	0 or more times consecutively.	`'\w*'` matches a word of any length.
`expr?`	0 times or 1 time.	`'\w*(\.m)?'` matches words that optionally end with the extension `.m`.
`expr+`	1 or more times consecutively.	`'<img src="\w+\.gif">'` matches an `<img>` HTML tag when the file name contains one or more characters.
`expr{m,n}`	At least `m` times, but no more than `n` times consecutively. `{0,1}` is equivalent to `?`.	`'\S{4,8}'` matches between four and eight non-white-space characters.
`expr{m,}`	At least `m` times consecutively. `{0,}` and `{1,}` are equivalent to `*` and `+`, respectively.	`'<a href="\w{1,}\.html">'` matches an `<a>` HTML tag when the file name contains one or more characters.
`expr{n}`	Exactly `n` times consecutively. Equivalent to `{n,n}`.	`'\d{4}'` matches four consecutive digits.

Quantifiers can appear in three modes, described in the following table. q represents any of the quantifiers in the previous table.

Mode	Description	Example
`expr`q	Greedy expression: match as many characters as possible.	Given the text `'<tr><td><p>text</p></td>'`, the expression `'</?t.*>'` matches all characters between `<tr` and `/td>`: `'<tr><td><p>text</p></td>'`
`expr`q`?`	Lazy expression: match as few characters as necessary.	Given the text`'<tr><td><p>text</p></td>'`, the expression `'</?t.*?>'` ends each match at the first occurrence of the closing angle bracket (`>`): `'<tr>' '<td>' '</td>'`
`expr`q+	Possessive expression: match as much as possible, but do not rescan any portions of the text.	Given the text`'<tr><td><p>text</p></td>'`, the expression `'</?t.+>'` does not return any matches, because the closing angle bracket is captured using `.`, and is not rescanned.

Mode

Description

Example

exprq

Greedy expression: match as many characters as possible.

Given the text '<tr><td>text</td>', the expression '</?t.*>' matches all characters between <tr and /td>:

'<tr><td><p>text</p></td>'

exprq?

Lazy expression: match as few characters as necessary.

Given the text'<tr><td>text</td>', the expression '</?t.*?>' ends each match at the first occurrence of the closing angle bracket (>):

'<tr>'   '<td>'   '</td>'

exprq+

Possessive expression: match as much as possible, but do not rescan any portions of the text.

Given the text'<tr><td>text</td>', the expression '</?t.*+>' does not return any matches, because the closing angle bracket is captured using .*, and is not rescanned.

Grouping Operators

Grouping operators allow you to capture tokens, apply one operator to multiple elements, or disable backtracking in a specific group. Tokens are portions of the matched text that you define by enclosing part of the regular expression in parentheses.

Grouping Operator	Description	Example
`(expr)`	Group elements of the expression and capture tokens.	`'Joh?n\s(\w*)'` captures a token that contains the last name of any person with the first name `John` or `Jon`.
`(?:expr)`	Group, but do not capture tokens.	`'(?:[aeiou][^aeiou]){2}'` matches two consecutive patterns of a vowel followed by a nonvowel, such as `'anon'`. Without grouping, `'[aeiou][^aeiou]{2}'`matches a vowel followed by two nonvowels.
`(?>expr)`	Group atomically. Do not backtrack within the group to complete the match, and do not capture tokens.	`'A(?>.)Z'` does not match `'AtoZ'`, although `'A(?:.)Z'` does. Using the atomic group, `Z` is captured using `.*` and is not rescanned.
`(expr1\|expr2)`	Match expression `expr1` or expression `expr2`. If there is a match with `expr1`, then `expr2` is ignored. You can include `?:` or `?>` after the opening parenthesis to suppress tokens or group atomically.	`'(let\|tel)\w+'` matches words that contain, but do not end, with `let` or `tel`.

Anchors

Anchors in the expression match the beginning or end of the input text or word.

Anchor	Matches the...	Example
`^expr`	Beginning of the input text.	`'^M\w*'` matches a word starting with `M` at the beginning of the text.
`expr$`	End of the input text.	`'\w*m$'` matches words ending with `m` at the end of the text.
`\<expr`	Beginning of a word.	`'\<n\w*'` matches any words starting with `n`.
`expr\>`	End of a word.	`'\w*e\>'` matches any words ending with `e`.

Lookaround Assertions

Lookaround assertions look for patterns that immediately precede or follow the intended match, but are not part of the match.

The pointer remains at the current location, and characters that correspond to the test expression are not captured or discarded. Therefore, lookahead assertions can match overlapping character groups.

Lookaround Assertion	Description	Example
`expr(?=test)`	Look ahead for characters that match `test`.	`'\w*(?=ing)'` matches terms that are followed by `ing`, such as `'Fly'` and `'fall'` in the input text `'Flying, not falling.'`
`expr(?!test)`	Look ahead for characters that do not match `test`.	`'i(?!ng)'` matches instances of the letter `i` that are not followed by `ng`.
`(?<=test)expr`	Look behind for characters that match `test`.	`'(?<=re)\w*'` matches terms that follow `'re'`, such as `'new'`, `'use'`, and `'cycle'` in the input text `'renew, reuse, recycle'`
`(?<!test)expr`	Look behind for characters that do not match `test`.	`'(?<!\d)(\d)(?!\d)'` matches single-digit numbers (digits that do not precede or follow other digits).

If you specify a lookahead assertion before an expression, the operation is equivalent to a logical AND.

Operation	Description	Example
`(?=test)expr`	Match both `test` and `expr`.	`'(?=[a-z])[^aeiou]'` matches consonants.
`(?!test)expr`	Match `expr` and do not match `test`.	`'(?![aeiou])[a-z]'` matches consonants.

Logical and Conditional Operators

Logical and conditional operators enable you to test the state of a given condition, and then use the outcome to determine which pattern, if any, to match next. These operators support logical OR, and if or if/else conditions.

Conditions can be tokens, lookaround operators, or dynamic expressions of the form (?@cmd). Dynamic expressions must return a logical or numeric value.

Conditional Operator	Description	Example
`expr1\|expr2`	Match expression `expr1` or expression `expr2`. If there is a match with `expr1`, then `expr2` is ignored.	`'(let\|tel)\w+'` matches words that start with `let` or `tel`.
`(?(cond)expr)`	If condition `cond` is `true`, then match `expr`.	`'(?(?@ispc)[A-Z]:\\)'` matches a drive name, such as `C:\`, when run on a Windows^® system.
`(?(cond)expr1\|expr2)`	If condition `cond` is `true`, then match `expr1`. Otherwise, match `expr2`.	`'Mr(s?)\..?(?(1)her\|his) \w'` matches text that includes `her` when the text begins with `Mrs`, or that includes `his` when the text begins with `Mr`.

Conditional Operator

Description

Example

expr1|expr2

Match expression expr1 or expression expr2.

If there is a match with expr1, then expr2 is ignored.

'(let|tel)\w+' matches words that start with let or tel.

(?(cond)expr)

If condition cond is true, then match expr.

'(?(?@ispc)[A-Z]:\\)' matches a drive name, such as C:\, when run on a Windows^® system.

(?(cond)expr1|expr2)

If condition cond is true, then match expr1. Otherwise, match expr2.

'Mr(s?)\..*?(?(1)her|his) \w*' matches text that includes her when the text begins with Mrs, or that includes his when the text begins with Mr.

Token Operators

Tokens are portions of the matched text that you define by enclosing part of the regular expression in parentheses. You can refer to a token by its sequence in the text (an ordinal token), or assign names to tokens for easier code maintenance and readable output.

Ordinal Token Operator	Description	Example
`(expr)`	Capture in a token the characters that match the enclosed expression.	`'Joh?n\s(\w*)'` captures a token that contains the last name of any person with the first name `John` or `Jon`.

Named Token Operator	Description	Example
`(?<name>expr)`	Capture in a named token the characters that match the enclosed expression.	`'(?<month>\d+)-(?<day>\d+)-(?<yr>\d+)'` creates named tokens for the month, day, and year in an input date of the form `mm-dd-yy`.

Note

If an expression has nested parentheses, MATLAB^® captures tokens that correspond to the outermost set of parentheses. For example, given the search pattern '(and(y|rew))', MATLAB creates a token for 'andrew' but not for 'y' or 'rew'.

Comments

Characters	Description	Example
`(?#comment)`	Insert a comment in the regular expression. The comment text is ignored when matching the input.	`'(?# Initial digit)\<\d\w+'` includes a comment, and matches words that begin with a number.

Search Flags

Search flags modify the behavior for matching expressions. An alternative to using a search flag within an expression is to pass an option input argument.

Flag	Description
`(?-i)`	Match letter case (default for `regexp` and `regexprep`).
`(?i)`	Do not match letter case (default for `regexpi`).
`(?s)`	Match dot (`.`) in the pattern with any character (default).
`(?-s)`	Match dot in the pattern with any character that is not a newline character.
`(?-m)`	Match the `^` and `$` metacharacters at the beginning and end of text (default).
`(?m)`	Match the `^` and `$` metacharacters at the beginning and end of a line.
`(?-x)`	Include space characters and comments when matching (default).
`(?x)`	Ignore space characters and comments when matching. Use `'\ '` and `'\#'` to match space and `#` characters.

The expression that the flag modifies can appear either after the parentheses, such as

(?i)\w*

or inside the parentheses and separated from the flag with a colon (:), such as

(?i:\w*)

The latter syntax allows you to change the behavior for part of a larger expression.

Data Types: char | cell | string

Note

regexpPattern does not support back references, conditions based on back references, and dynamic regular expressions.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: 'DotExceptNewline',true,'FreeSpacing',false

`DotExceptNewline` — Dot matching of new lines
`false` or `0` (default) | `true` or `1`

Dot matching of newline character, specified as the comma-separated pair consisting of 'DotExceptNewline' and a logical scalar. Set this option to 0 (false) to omit newline characters from dot matching.

Example: pat = regexpPattern('m.','DotExceptNewline',true)

`FreeSpacing` — Matching white space
`false` or `0` (default) | `true` or `1`

Matching white space character, specified as the comma-separated pair consisting of 'FreeSpacing' and a logical scalar. Set this option to 1 (true) to omit whitespace characters and comments when matching.

Example: pat = regexpPattern('m.','FreeSpacing',false)

`IgnoreCase` — Ignore case when matching
`false` or `0` (default) | `true` or `1`

Ignore case when matching, specified as the comma-separated pair consisting of 'IgnoreCase' and a logical scalar. Set this option to 1 (true) to match regardless of case.

Example: pat = regexpPattern('m.','IgnoreCase',true)

`Anchors` — Metacharacter treatment
`'text'` (default) | `'line'`

Metacharacter treatment, specified as the comma-separated pair consisting of 'Anchors' and one of these values:

Value	Description
`'text'`	Treat the metacharacters `^` and `$` as text anchors. This anchors regular expression matches to the beginning or end of text, which might span multiple lines.
`'line'`	Treat the metacharacters `^` and `$` as line anchors. This anchors regular expression matches to the beginning or end of lines in the text. This option is useful when you have multiline text and do not want matches to span multiple lines.

Example: pat = regexpPattern('\d+','Anchors','line')

Output Arguments

collapse all

`pat` — Pattern expression
pattern object

Pattern expression, returned as a pattern object.

Extended Capabilities

Thread-Based Environment
Run code in the background using MATLAB® `backgroundPool` or accelerate code with Parallel Computing Toolbox™ `ThreadPool`.

Version History

Introduced in R2020b

regexpPattern

Syntax

Description

Examples

Combine Patterns and Regular Expressions

Ignore `newline` Characters

Ignore Whitespaces in Expressions When Matching

Ignore Case with Regular Expressions

Designate `^` and `$` Anchors as Line or Text Anchors

Input Arguments

`expression` — Regular expression
character vector | cell array of character vectors | string array

Name-Value Arguments

`DotExceptNewline` — Dot matching of new lines
`false` or `0` (default) | `true` or `1`

`FreeSpacing` — Matching white space
`false` or `0` (default) | `true` or `1`

`IgnoreCase` — Ignore case when matching
`false` or `0` (default) | `true` or `1`

`Anchors` — Metacharacter treatment
`'text'` (default) | `'line'`

Output Arguments

`pat` — Pattern expression
pattern object

Extended Capabilities

Thread-Based Environment
Run code in the background using MATLAB® `backgroundPool` or accelerate code with Parallel Computing Toolbox™ `ThreadPool`.

Version History

See Also

Topics

regexpPattern

Syntax

Description

Examples

Combine Patterns and Regular Expressions

Ignore newline Characters

Ignore Whitespaces in Expressions When Matching

Ignore Case with Regular Expressions

Designate ^ and $ Anchors as Line or Text Anchors

Input Arguments

expression — Regular expression character vector | cell array of character vectors | string array

Name-Value Arguments

DotExceptNewline — Dot matching of new lines false or 0 (default) | true or 1

FreeSpacing — Matching white space false or 0 (default) | true or 1

IgnoreCase — Ignore case when matching false or 0 (default) | true or 1

Anchors — Metacharacter treatment 'text' (default) | 'line'

Output Arguments

pat — Pattern expression pattern object

Extended Capabilities

Thread-Based Environment Run code in the background using MATLAB® backgroundPool or accelerate code with Parallel Computing Toolbox™ ThreadPool.

Version History

See Also

Topics

Ignore `newline` Characters

Designate `^` and `$` Anchors as Line or Text Anchors

`expression` — Regular expression
character vector | cell array of character vectors | string array

`DotExceptNewline` — Dot matching of new lines
`false` or `0` (default) | `true` or `1`

`FreeSpacing` — Matching white space
`false` or `0` (default) | `true` or `1`

`IgnoreCase` — Ignore case when matching
`false` or `0` (default) | `true` or `1`

`Anchors` — Metacharacter treatment
`'text'` (default) | `'line'`

`pat` — Pattern expression
pattern object

Thread-Based Environment
Run code in the background using MATLAB® `backgroundPool` or accelerate code with Parallel Computing Toolbox™ `ThreadPool`.