阿里云主机折上折
  • 微信号
Current Site:Index > Regular expression named capture groups

Regular expression named capture groups

Author:Chuan Chen 阅读数:52492人阅读 分类: JavaScript

ECMAScript 9 introduced named capture groups in regular expressions, significantly improving the readability and maintainability of regex patterns. By assigning named identifiers to capture groups, developers can access matching results more intuitively, avoiding the confusion caused by traditional numeric indices.

Basic Syntax of Named Capture Groups

Named capture groups are defined using the ?<name> syntax, where name is a custom identifier chosen by the developer. This syntax is placed inside the parentheses of a regular capture group, immediately following the ?:

const regex = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const match = regex.exec('2023-05-15');
console.log(match.groups.year);  // "2023"
console.log(match.groups.month); // "05"
console.log(match.groups.day);   // "15"

Compared to traditional numeric-indexed capture groups, named capture groups provide a clearer way to access results through the groups property. At the same time, numeric indices are still preserved in the matching results:

console.log(match[1]); // "2023" (numeric index 1 corresponds to year)
console.log(match[2]); // "05"   (numeric index 2 corresponds to month)

Named Capture Groups and Destructuring Assignment

When combined with ES6 destructuring assignment, named capture groups make the code even more concise:

const { groups: { year, month, day } } = regex.exec('2023-05-15');
console.log(year, month, day); // "2023" "05" "15"

This syntax is particularly useful for extracting specific fields from complex regular expressions:

const urlRegex = /(?<protocol>https?):\/\/(?<host>[^/]+)\/(?<path>.*)/;
const { groups: { protocol, host, path } } = urlRegex.exec('https://example.com/posts/123');
console.log(protocol); // "https"
console.log(host);     // "example.com"
console.log(path);     // "posts/123"

Named References in Replacement Strings

In string replacement operations, named capture groups can be referenced using the $<name> syntax:

const dateRegex = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const newDate = '2023-05-15'.replace(dateRegex, '$<day>/$<month>/$<year>');
console.log(newDate); // "15/05/2023"

For more complex replacement logic, a function can be used as the replacement argument, with access to named capture groups through the parameters:

const result = '2023-05-15'.replace(dateRegex, (...args) => {
  const { year, month, day } = args[args.length - 1]; // The last argument is the groups object
  return `${day.padStart(2, '0')}-${month}-${year}`;
});
console.log(result); // "15-05-2023"

Backreferences to Named Capture Groups

Within a regular expression, named capture groups can be referenced using the \k<name> syntax:

const duplicateRegex = /^(?<word>[a-z]+) \k<word>$/;
console.log(duplicateRegex.test('hello hello')); // true
console.log(duplicateRegex.test('hello world')); // false

This syntax is especially useful for matching repeated patterns, such as paired HTML tags:

const htmlTagRegex = /<(?<tag>[a-z][a-z0-9]*)\b[^>]*>.*?<\/\k<tag>>/;
console.log(htmlTagRegex.test('<div>content</div>')); // true
console.log(htmlTagRegex.test('<div>content</span>')); // false

Named Capture Groups and Unicode Property Escapes

ECMAScript 9 also introduced Unicode property escapes, which can be combined with named capture groups:

const unicodeRegex = /(?<letter>\p{L}+)\s+(?<number>\p{N}+)/u;
const unicodeMatch = unicodeRegex.exec('日本語 123');
console.log(unicodeMatch.groups.letter); // "日本語"
console.log(unicodeMatch.groups.number); // "123"

This combination is particularly powerful when working with multilingual text, allowing precise matching of specific Unicode character categories.

Default Values and Optional Named Capture Groups

While named capture groups themselves do not support optional markers, they can be simulated using the logical OR | operator:

const optionalRegex = /(?<prefix>Mr|Ms|Mrs)?\s+(?<name>\w+)/;
const match1 = optionalRegex.exec('Mr Smith');
console.log(match1.groups.prefix); // "Mr"
console.log(match1.groups.name);   // "Smith"

const match2 = optionalRegex.exec('Johnson');
console.log(match2.groups.prefix); // undefined
console.log(match2.groups.name);   // "Johnson"

When dealing with potentially missing capture groups, always check the values in the groups object:

const { groups: { prefix = 'Unknown', name } } = optionalRegex.exec('Johnson');
console.log(prefix); // "Unknown"
console.log(name);   // "Johnson"

Performance Considerations

Named capture groups perform nearly identically to regular capture groups, as modern JavaScript engines optimize them. However, in extremely performance-sensitive scenarios, benchmarking can be used to compare:

// Test named capture group performance
console.time('named');
for (let i = 0; i < 1000000; i++) {
  /(?<value>\d+)/.exec('123');
}
console.timeEnd('named');

// Test regular capture group performance
console.time('unnamed');
for (let i = 0; i < 1000000; i++) {
  /(\d+)/.exec('123');
}
console.timeEnd('unnamed');

Actual test results show that the difference is typically negligible, so the choice to use named capture groups should be based on code readability rather than performance.

Browser Compatibility and Transpilation

While modern browsers widely support named capture groups, compatibility must be considered for older environments. Transpilers like Babel can convert named capture groups to traditional syntax:

Original code:

const regex = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;

Transpiled code:

var regex = /(\d{4})-(\d{2})-(\d{2})/;

The transpiled code includes additional logic to maintain compatibility with the groups object. Named references in replacement operations ($<name>) are also converted to numeric reference forms.

Practical Use Cases

Named capture groups are particularly useful for parsing structured text, such as log file analysis:

const logRegex = /\[(?<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\] \[(?<level>\w+)\] (?<message>.*)/;
const logLine = '[2023-05-15 14:30:00] [ERROR] Database connection failed';

const { groups: { timestamp, level, message } } = logRegex.exec(logLine);
console.log(`At ${timestamp}, ${level} occurred: ${message}`);
// "At 2023-05-15 14:30:00, ERROR occurred: Database connection failed"

Another typical scenario is handling international phone numbers:

const phoneRegex = /^\+(?<country>\d{1,3})[- ]?(?<area>\d{1,4})[- ]?(?<local>\d{4,10})$/;
const phoneNumbers = [
  '+1 415 5552671',
  '+44 20 71234567',
  '+81312345678'
];

phoneNumbers.forEach(phone => {
  const { groups } = phoneRegex.exec(phone) || {};
  if (groups) {
    console.log(`Country code: ${groups.country}, Area code: ${groups.area}`);
  }
});

Interaction with Other Regex Features

Named capture groups work seamlessly with other new regex features, such as the dotAll mode (s flag):

const multilineRegex = /^(?<header>[^:]+):(?<value>.*)$/gms;
const text = `
Content-Type: text/html
Content-Length: 1024
`;

let match;
while (match = multilineRegex.exec(text)) {
  console.log(`${match.groups.header}: ${match.groups.value.trim()}`);
}
// "Content-Type: text/html"
// "Content-Length: 1024"

They can also be combined with lookbehind assertions:

const priceRegex = /(?<=\$)(?<dollars>\d+)\.(?<cents>\d{2})/;
const { groups: { dollars, cents } } = priceRegex.exec('The price is $42.99');
console.log(`${dollars} dollars and ${cents} cents`); // "42 dollars and 99 cents"

Common Pitfalls and Best Practices

When using named capture groups, be aware of the following issues:

  1. Duplicate Group Names: The same regex cannot have duplicate group names

    // Bad example
    const invalidRegex = /(?<group>\d+) (?<group>\d+)/; // SyntaxError
    
  2. Invalid Group Names: Group names must follow identifier naming rules

    // Bad example
    const invalidNameRegex = /(?<1group>\d+)/; // SyntaxError
    
  3. Legacy Environment Compatibility: Accessing groups in unsupported environments throws an error

    try {
      const oldRegex = /(?<value>\d+)/;
      oldRegex.exec('123').groups.value;
    } catch (e) {
      console.error('Environment does not support named capture groups');
    }
    

Best practices include:

  • Always using named forms for important capture groups
  • Providing default values for potentially missing capture groups
  • Including compatibility solutions in library code
  • Using meaningful group names instead of generic ones

Advanced Pattern Matching Techniques

For complex parsing needs, multiple named capture groups can be combined:

const complexRegex = 
  /^(?<protocol>\w+):\/\/(?<host>[^/:]+)(?::(?<port>\d+))?(?<path>\/[^?]*)?(?:\?(?<query>.*))?$/;

const urls = [
  'https://example.com:8080/path?query=string',
  'ftp://files.example.com',
  'http://localhost/path'
];

urls.forEach(url => {
  const { groups } = complexRegex.exec(url) || {};
  if (groups) {
    console.log(`Protocol: ${groups.protocol}, Host: ${groups.host}, Port: ${groups.port || 'default'}`);
  }
});

This pattern can fully decompose all components of a URL while handling optional parts.

本站部分内容来自互联网,一切版权均归源网站或源作者所有。

如果侵犯了你的权益请来信告知我们删除。邮箱:cc@cccx.cn

Front End Chuan

Front End Chuan, Chen Chuan's Code Teahouse 🍵, specializing in exorcising all kinds of stubborn bugs 💻. Daily serving baldness-warning-level development insights 🛠️, with a bonus of one-liners that'll make you laugh for ten years 🐟. Occasionally drops pixel-perfect romance brewed in a coffee cup ☕.