Re: About \d \w and \s


Just another Perl shrine
	PerlMonks

Re: About \d \w and \s

by Corion (Patriarch)

on Oct 18, 2009 at 16:14 UTC ( [id://801880]=note: print w/replies, xml )

Need Help??

in reply to About \d \w and \s

Personally, I've avoided relying on unicode/charset semantics with regular expressions. Most of the input I deal with is either Latin-1 or some other "near IBM-ASCII" single byte encoding, and so is my source code. I've made my regular expressions lenient in the sense that I use dots where I expect umlauts.

Of course, if I were more strict about the encodings of my input data, or Perl were more smart about guessing the encoding of my input data (which is hard without carrying a dictionary of likenesses), I could write my source code and my regular expressions in unicode, and then it would be cool if \w would use the unicode semantics.

I have no opinion on \d, as German has only 0-9 as digits anyway, and my input data also.

Comment on Re: About \d \w and \s Select or Download Code

Replies are listed 'Best First'.
Re^2: About \d \w and \s by demerphq (Chancellor) on Oct 18, 2009 at 16:19 UTC
I believe that /u would provide sane matching for German or other latin-1 languages as it would make perl match according to the unicode rules even when the string/pattern weren't themselves unicode. --- $world=~s/war/peace/g	[reply]

In Section Meditations

Domain Nodelet^?

www.com | www.net | www.org

Node Status^?

node history
Node Type: note [id://801880]
help

Chatterbox^?

How do I use this? • Last hour • Other CB clients

Other Users^?

Others scrutinizing the Monastery: (5)

As of 2024-04-19 23:20 GMT

Sections^?

Information^?

Find Nodes^?

Leftovers^?

Today I Learned

Voting Booth^?

No recent polls found