阿里云主机折上折
  • 微信号
Current Site:Index > The extension of regular expressions

The extension of regular expressions

Author:Chuan Chen 阅读数:23777人阅读 分类: JavaScript

ECMAScript 6 introduced significant enhancements to regular expressions, adding Unicode support, named capture groups, lookbehind assertions, and other features, while also extending the methods and modifiers of regular expression objects. These improvements make regular expressions more powerful and flexible when handling complex text.

Unicode Property Escapes

ES6 introduced the \p{...} and \P{...} syntax, enabling direct matching of Unicode character properties. This is enabled via the u modifier:

// Match all Greek letters
const greekRegex = /\p{Script=Greek}/u;
console.log(greekRegex.test('π')); // true

// Match all non-ASCII punctuation
const punctuationRegex = /\p{P}/u;
console.log(punctuationRegex.test('!')); // true

Property categories include:

  • Script: Classified by writing system (e.g., Han, Latin)
  • General_Category: Classified by character type (e.g., Letter, Number)
  • White_Space: Whitespace characters

Named Capture Groups

Traditional capture groups are accessed via numeric indices, but ES6 allows naming capture groups:

const dateRegex = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const match = dateRegex.exec('2023-05-21');

console.log(match.groups.year);  // "2023"
console.log(match.groups.month); // "05"
console.log(match.groups.day);   // "21"

In replacement strings, named groups can be referenced via $<name>:

'2023-05-21'.replace(dateRegex, '$<day>/$<month>/$<year>');
// Returns "21/05/2023"

Lookbehind Assertions

Added (?<=...) positive lookbehind and (?<!...) negative lookbehind assertions:

// Match amounts preceded by a dollar sign
const amountRegex = /(?<=\$)\d+(\.\d{2})?/;
console.log(amountRegex.exec('Price: $42')[0]); // "42"

// Match numbers not preceded by a dollar sign
const noDollarRegex = /(?<!\$)\d+/;
console.log(noDollarRegex.exec('€100')[0]); // "100"

dotAll Mode

The s modifier allows the dot . to match any character (including newlines):

const multilineRegex = /foo.bar/s;
console.log(multilineRegex.test('foo\nbar')); // true

Sticky Matching

The y modifier enables sticky matching, requiring matches to start at the current position in the target string:

const stickyRegex = /a+/y;
let str = 'aaa_aaa';

stickyRegex.lastIndex = 0;
console.log(stickyRegex.exec(str)[0]); // "aaa"

stickyRegex.lastIndex = 4;
console.log(stickyRegex.exec(str)[0]); // "aaa"

stickyRegex.lastIndex = 1; // Doesn't match starting position
console.log(stickyRegex.exec(str)); // null

flags Property

Added the flags property to retrieve the modifier string of a regular expression:

const re = /foo/ig;
console.log(re.flags); // "gi"

RegExp Constructor Extensions

Supports copying an existing regex object and overriding modifiers:

const re1 = /foo/i;
const re2 = new RegExp(re1, 'g');
console.log(re2.toString()); // "/foo/g"

String Matching Method Adjustments

String methods like match, replace, search, and split now internally call Symbol.match and other built-in methods:

class CustomMatcher {
  [Symbol.match](string) {
    return string.includes('foo') ? ['foo'] : null;
  }
}

console.log('barfoo'.match(new CustomMatcher())); // ["foo"]

Unicode Case Folding

In u mode, case matching adheres more closely to Unicode standards:

console.log(/[a-z]/i.test('K')); // false
console.log(/[a-z]/iu.test('K')); // true (matches Kelvin symbol)

Regular Expression Subclassing

Create custom regex classes by extending RegExp:

class MyRegExp extends RegExp {
  exec(str) {
    const result = super.exec(str);
    if (result) result.push('extra');
    return result;
  }
}

const myRe = new MyRegExp('\\d+');
console.log(myRe.exec('123')); // ["123", "extra"]

Match Indices Proposal (ES2022)

The d modifier captures the start and end indices of each capture group:

const re = /(a+)(b+)/d;
const match = re.exec('aaabb');

console.log(match.indices[0]); // [0, 5]
console.log(match.indices[1]); // [0, 3]
console.log(match.indices[2]); // [3, 5]

Practical Examples

Processing complex log formats:

const logRegex = /^(?<time>\d{2}:\d{2}:\d{2}) \[(?<level>\w+)\] (?<message>.+?)(?: \((?<file>.+?):(?<line>\d+)\))?$/;
const logLine = '14:35:22 [ERROR] Failed to load module (app.js:42)';

const { groups } = logRegex.exec(logLine);
console.log(groups);
// {
//   time: "14:35:22",
//   level: "ERROR",
//   message: "Failed to load module",
//   file: "app.js",
//   line: "42"
// }

Extracting Markdown links:

function extractLinks(markdown) {
  const linkRegex = /\[(?<text>[^\]]+)\]\((?<url>[^)]+)\)/g;
  return [...markdown.matchAll(linkRegex)].map(m => m.groups);
}

const links = extractLinks('See [Google](https://google.com) or [GitHub](https://github.com)');
console.log(links);
// [
//   {text: "Google", url: "https://google.com"},
//   {text: "GitHub", url: "https://github.com"}
// ]

本站部分内容来自互联网,一切版权均归源网站或源作者所有。

如果侵犯了你的权益请来信告知我们删除。邮箱:cc@cccx.cn

Front End Chuan

Front End Chuan, Chen Chuan's Code Teahouse 🍵, specializing in exorcising all kinds of stubborn bugs 💻. Daily serving baldness-warning-level development insights 🛠️, with a bonus of one-liners that'll make you laugh for ten years 🐟. Occasionally drops pixel-perfect romance brewed in a coffee cup ☕.