http://qs1969.pair.com?node_id=1190933


in reply to \b in Unicode regex

G'day Arik123,

Two pieces of information, from perlrebackslash, to note.

From the "Character classes" section:

"\w s a character class that matches any single word character (letters, digits, Unicode marks, and connector punctuation (like the underscore))." [my emphasis]

From the "Assertions" section:

"\b ... matches at any place between a word (something matched by \w) and a non-word character" [my emphasis again]

In your reply with actual data, you're effectively trying to match "XXXXX", which occurs in your string as "_XXXXX.". Both '_' and 'X' match "\w": "\b" does not match between '_' and 'X'.

As already demonstrated twice[1,2], there is no Unicode issue here.

— Ken