Regular expression matching index
ECMAScript 13 introduced the regular expression match indices feature, providing developers with more precise control over match results. Through the d
flag and the indices
property, developers can obtain the exact positional information of matched substrings within the original string.
Basic Concepts of Regular Expression Match Indices
The regular expression match indices feature allows developers to retrieve the start and end positions of match results within the original string. This feature is enabled by the d
flag and returns positional information in the indices
property of the match result.
const regex = /a+/d;
const str = 'aaabbb';
const result = regex.exec(str);
console.log(result.indices);
// Output: [[0, 3]]
The Role of the d
Flag
The d
flag is key to enabling the match indices feature. When a regular expression uses the d
flag, the match result includes an indices
property, which is an array containing the start and end indices of each match.
const regex = /(\w+)\s(\w+)/d;
const str = 'John Smith';
const result = regex.exec(str);
console.log(result.indices);
// Output: [[0, 10], [0, 4], [5, 10]]
Structure of the indices
Property
The indices
property is a two-dimensional array where each subarray represents a match range:
- The first element represents the range of the entire match.
- Subsequent elements represent the ranges of capturing groups.
const regex = /(\d{4})-(\d{2})-(\d{2})/d;
const str = '2023-05-15';
const result = regex.exec(str);
console.log(result.indices);
/*
Output:
[
[0, 10], // Entire match
[0, 4], // First capturing group (year)
[5, 7], // Second capturing group (month)
[8, 10] // Third capturing group (day)
]
*/
Indices for Named Capturing Groups
For named capturing groups, the indices
property includes a groups
object containing the index ranges for each named group.
const regex = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/d;
const str = '2023-05-15';
const result = regex.exec(str);
console.log(result.indices.groups);
/*
Output:
{
year: [0, 4],
month: [5, 7],
day: [8, 10]
}
*/
Practical Applications of Match Indices
The match indices feature is particularly useful in scenarios requiring precise location of matched content, such as:
- Syntax highlighting
- Code editors
- Text search and replace tools
function highlightMatch(text, regex) {
const matches = [...text.matchAll(regex)];
let result = '';
let lastIndex = 0;
matches.forEach(match => {
const [start, end] = match.indices[0];
result += text.slice(lastIndex, start);
result += `<span class="highlight">${text.slice(start, end)}</span>`;
lastIndex = end;
});
result += text.slice(lastIndex);
return result;
}
const text = 'The quick brown fox jumps over the lazy dog';
const regex = /\b\w{4}\b/dg;
console.log(highlightMatch(text, regex));
Match Indices and Global Matching
When using the g
flag for global matching, the matchAll()
method returns an iterator where each match result includes the indices
property.
const regex = /\b\w{3}\b/dg;
const str = 'The cat sat on the mat';
const matches = [...str.matchAll(regex)];
matches.forEach(match => {
console.log(`"${match[0]}" at [${match.indices[0][0]}, ${match.indices[0][1]}]`);
});
/*
Output:
"The" at [0, 3]
"cat" at [4, 7]
"sat" at [8, 11]
"the" at [16, 19]
"mat" at [20, 23]
*/
Performance Considerations for Match Indices
While the match indices feature provides additional information, it incurs some performance overhead. In scenarios where positional information is not needed, the d
flag can be omitted for better performance.
// When positional information is not needed
const regexFast = /a+/;
// When positional information is needed
const regexWithIndices = /a+/d;
Match Indices and Unicode Characters
For strings containing Unicode characters, match indices are still returned based on code unit positions rather than visual character positions.
const regex = /😊+/d;
const str = 'Hello😊😊World';
const result = regex.exec(str);
console.log(result.indices[0]); // Output: [5, 9]
// Note: Each 😊 occupies 2 code units
Edge Cases for Match Indices
In certain edge cases, it is important to understand the behavior of the indices
property:
- Unmatched capturing groups return
undefined
. - Optional groups may not appear in the
indices
array.
const regex = /(a)?(b)?/d;
const str = 'b';
const result = regex.exec(str);
console.log(result.indices);
/*
Output:
[
[0, 1], // Entire match
undefined, // First capturing group not matched
[0, 1] // Second capturing group
]
*/
Match Indices and Regular Expression Modifiers
The d
flag can be combined with other regular expression modifiers, such as i
(case-insensitive) and m
(multiline mode).
const regex = /^[a-z]+$/dmi;
const str = 'Hello\nWorld';
const matches = [...str.matchAll(regex)];
matches.forEach(match => {
console.log(`"${match[0]}" at line ${match.indices[0][0]}`);
});
/*
Output:
"Hello" at line 0
"World" at line 6
*/
Browser Compatibility and Node.js Support
As of 2023, major modern browsers and the latest versions of Node.js support the regular expression match indices feature. However, compatibility issues should still be considered when using it.
// Check if the d flag is supported
function isRegExpIndicesSupported() {
try {
new RegExp('', 'd');
return true;
} catch (e) {
return false;
}
}
本站部分内容来自互联网,一切版权均归源网站或源作者所有。
如果侵犯了你的权益请来信告知我们删除。邮箱:cc@cccx.cn