Regular expression named capture groups
ECMAScript 9 introduced named capture groups in regular expressions, significantly improving the readability and maintainability of regex patterns. By assigning named identifiers to capture groups, developers can access matching results more intuitively, avoiding the confusion caused by traditional numeric indices.
Basic Syntax of Named Capture Groups
Named capture groups are defined using the ?<name>
syntax, where name
is a custom identifier chosen by the developer. This syntax is placed inside the parentheses of a regular capture group, immediately following the ?
:
const regex = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const match = regex.exec('2023-05-15');
console.log(match.groups.year); // "2023"
console.log(match.groups.month); // "05"
console.log(match.groups.day); // "15"
Compared to traditional numeric-indexed capture groups, named capture groups provide a clearer way to access results through the groups
property. At the same time, numeric indices are still preserved in the matching results:
console.log(match[1]); // "2023" (numeric index 1 corresponds to year)
console.log(match[2]); // "05" (numeric index 2 corresponds to month)
Named Capture Groups and Destructuring Assignment
When combined with ES6 destructuring assignment, named capture groups make the code even more concise:
const { groups: { year, month, day } } = regex.exec('2023-05-15');
console.log(year, month, day); // "2023" "05" "15"
This syntax is particularly useful for extracting specific fields from complex regular expressions:
const urlRegex = /(?<protocol>https?):\/\/(?<host>[^/]+)\/(?<path>.*)/;
const { groups: { protocol, host, path } } = urlRegex.exec('https://example.com/posts/123');
console.log(protocol); // "https"
console.log(host); // "example.com"
console.log(path); // "posts/123"
Named References in Replacement Strings
In string replacement operations, named capture groups can be referenced using the $<name>
syntax:
const dateRegex = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const newDate = '2023-05-15'.replace(dateRegex, '$<day>/$<month>/$<year>');
console.log(newDate); // "15/05/2023"
For more complex replacement logic, a function can be used as the replacement argument, with access to named capture groups through the parameters:
const result = '2023-05-15'.replace(dateRegex, (...args) => {
const { year, month, day } = args[args.length - 1]; // The last argument is the groups object
return `${day.padStart(2, '0')}-${month}-${year}`;
});
console.log(result); // "15-05-2023"
Backreferences to Named Capture Groups
Within a regular expression, named capture groups can be referenced using the \k<name>
syntax:
const duplicateRegex = /^(?<word>[a-z]+) \k<word>$/;
console.log(duplicateRegex.test('hello hello')); // true
console.log(duplicateRegex.test('hello world')); // false
This syntax is especially useful for matching repeated patterns, such as paired HTML tags:
const htmlTagRegex = /<(?<tag>[a-z][a-z0-9]*)\b[^>]*>.*?<\/\k<tag>>/;
console.log(htmlTagRegex.test('<div>content</div>')); // true
console.log(htmlTagRegex.test('<div>content</span>')); // false
Named Capture Groups and Unicode Property Escapes
ECMAScript 9 also introduced Unicode property escapes, which can be combined with named capture groups:
const unicodeRegex = /(?<letter>\p{L}+)\s+(?<number>\p{N}+)/u;
const unicodeMatch = unicodeRegex.exec('日本語 123');
console.log(unicodeMatch.groups.letter); // "日本語"
console.log(unicodeMatch.groups.number); // "123"
This combination is particularly powerful when working with multilingual text, allowing precise matching of specific Unicode character categories.
Default Values and Optional Named Capture Groups
While named capture groups themselves do not support optional markers, they can be simulated using the logical OR |
operator:
const optionalRegex = /(?<prefix>Mr|Ms|Mrs)?\s+(?<name>\w+)/;
const match1 = optionalRegex.exec('Mr Smith');
console.log(match1.groups.prefix); // "Mr"
console.log(match1.groups.name); // "Smith"
const match2 = optionalRegex.exec('Johnson');
console.log(match2.groups.prefix); // undefined
console.log(match2.groups.name); // "Johnson"
When dealing with potentially missing capture groups, always check the values in the groups
object:
const { groups: { prefix = 'Unknown', name } } = optionalRegex.exec('Johnson');
console.log(prefix); // "Unknown"
console.log(name); // "Johnson"
Performance Considerations
Named capture groups perform nearly identically to regular capture groups, as modern JavaScript engines optimize them. However, in extremely performance-sensitive scenarios, benchmarking can be used to compare:
// Test named capture group performance
console.time('named');
for (let i = 0; i < 1000000; i++) {
/(?<value>\d+)/.exec('123');
}
console.timeEnd('named');
// Test regular capture group performance
console.time('unnamed');
for (let i = 0; i < 1000000; i++) {
/(\d+)/.exec('123');
}
console.timeEnd('unnamed');
Actual test results show that the difference is typically negligible, so the choice to use named capture groups should be based on code readability rather than performance.
Browser Compatibility and Transpilation
While modern browsers widely support named capture groups, compatibility must be considered for older environments. Transpilers like Babel can convert named capture groups to traditional syntax:
Original code:
const regex = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
Transpiled code:
var regex = /(\d{4})-(\d{2})-(\d{2})/;
The transpiled code includes additional logic to maintain compatibility with the groups
object. Named references in replacement operations ($<name>
) are also converted to numeric reference forms.
Practical Use Cases
Named capture groups are particularly useful for parsing structured text, such as log file analysis:
const logRegex = /\[(?<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\] \[(?<level>\w+)\] (?<message>.*)/;
const logLine = '[2023-05-15 14:30:00] [ERROR] Database connection failed';
const { groups: { timestamp, level, message } } = logRegex.exec(logLine);
console.log(`At ${timestamp}, ${level} occurred: ${message}`);
// "At 2023-05-15 14:30:00, ERROR occurred: Database connection failed"
Another typical scenario is handling international phone numbers:
const phoneRegex = /^\+(?<country>\d{1,3})[- ]?(?<area>\d{1,4})[- ]?(?<local>\d{4,10})$/;
const phoneNumbers = [
'+1 415 5552671',
'+44 20 71234567',
'+81312345678'
];
phoneNumbers.forEach(phone => {
const { groups } = phoneRegex.exec(phone) || {};
if (groups) {
console.log(`Country code: ${groups.country}, Area code: ${groups.area}`);
}
});
Interaction with Other Regex Features
Named capture groups work seamlessly with other new regex features, such as the dotAll mode (s
flag):
const multilineRegex = /^(?<header>[^:]+):(?<value>.*)$/gms;
const text = `
Content-Type: text/html
Content-Length: 1024
`;
let match;
while (match = multilineRegex.exec(text)) {
console.log(`${match.groups.header}: ${match.groups.value.trim()}`);
}
// "Content-Type: text/html"
// "Content-Length: 1024"
They can also be combined with lookbehind assertions:
const priceRegex = /(?<=\$)(?<dollars>\d+)\.(?<cents>\d{2})/;
const { groups: { dollars, cents } } = priceRegex.exec('The price is $42.99');
console.log(`${dollars} dollars and ${cents} cents`); // "42 dollars and 99 cents"
Common Pitfalls and Best Practices
When using named capture groups, be aware of the following issues:
-
Duplicate Group Names: The same regex cannot have duplicate group names
// Bad example const invalidRegex = /(?<group>\d+) (?<group>\d+)/; // SyntaxError
-
Invalid Group Names: Group names must follow identifier naming rules
// Bad example const invalidNameRegex = /(?<1group>\d+)/; // SyntaxError
-
Legacy Environment Compatibility: Accessing
groups
in unsupported environments throws an errortry { const oldRegex = /(?<value>\d+)/; oldRegex.exec('123').groups.value; } catch (e) { console.error('Environment does not support named capture groups'); }
Best practices include:
- Always using named forms for important capture groups
- Providing default values for potentially missing capture groups
- Including compatibility solutions in library code
- Using meaningful group names instead of generic ones
Advanced Pattern Matching Techniques
For complex parsing needs, multiple named capture groups can be combined:
const complexRegex =
/^(?<protocol>\w+):\/\/(?<host>[^/:]+)(?::(?<port>\d+))?(?<path>\/[^?]*)?(?:\?(?<query>.*))?$/;
const urls = [
'https://example.com:8080/path?query=string',
'ftp://files.example.com',
'http://localhost/path'
];
urls.forEach(url => {
const { groups } = complexRegex.exec(url) || {};
if (groups) {
console.log(`Protocol: ${groups.protocol}, Host: ${groups.host}, Port: ${groups.port || 'default'}`);
}
});
This pattern can fully decompose all components of a URL while handling optional parts.
本站部分内容来自互联网,一切版权均归源网站或源作者所有。
如果侵犯了你的权益请来信告知我们删除。邮箱:cc@cccx.cn
上一篇:Rest/Spread属性
下一篇:正则表达式反向断言