in reply to regular expressions in unicode
You must decide what you call a 'word'. In my opinion it is any alphanumeric sequence with apostrophes and hyphens, but your definition may differ.
My regex matches only "TEST" at the beginning of the string in the example.#!/usr/bin/perl use warnings; use strict; my $test = "TEST TESt TE'st T 12TE"; while ($test =~ /(?<![\pL\pN\'-]) #NOT a hyphen, apostroph, lett +er or number before (\p{Lu}{2,}) # two or more uppercase letter +s (?![\pL\pN\'-]) #NOT a hyphen, apostroph, lett +er or number after /xg) { print "$1\n"; }
|
|---|