\s and non-breaking spaces

Ratazong has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

today I have played around a bit with HTML::Tree and its parts. And was surprised that the following expression was not working (as intended) on a string extracted from a webpage using as_trimmed_text().

$name =~ s/\sx\s\d+//;  # remove trailing " x 3" (and similar)
[download]

After a lot of searching I found the culprit: the blanks were coded as 0xA0 (non-breaking spaces). And \s is not matching them. Is there a better way to handle those besides my rather ugly solution below?

$name =~ s/[\s\xA0]x[\s\xA0]\d+//;
[download]

Or another workaround?

Rata

Update: Thanks a lot rovf, Eliya and LanX: I added all of your solutions to my code for future reference - all work like a charm :-) Now that I installed v5.14, I will keep the solution with use feature "unicode_strings"; active - I like its elegance.

And thanks for the link to the interesting blog-post, shawnhcorey!

Comment on \s and non-breaking spaces Select or Download Code

Replies are listed 'Best First'.
Re: \s and non-breaking spaces by Eliya (Vicar) on Feb 29, 2012 at 13:15 UTC
`use feature "unicode_strings"` (>=v5.14) may help with that: `use feature "unicode_strings"; my $name = "foo\xa0x\xa03"; $name =~ s/\sx\s\d+//; say $name; # "foo"` [download]	[reply] [d/l] [select]
Re: \s and non-breaking spaces by rovf (Priest) on Feb 29, 2012 at 13:14 UTC
If you apply several regexpes to your HTML text, you could first translate all nbsp to real spaces, i.e. `$text =~ tr/\xA0/ /;` [download] -- Ronald Fischer <ynnor@mm.st>	[reply] [d/l]
Re: \s and non-breaking spaces by shawnhcorey (Friar) on Feb 29, 2012 at 14:57 UTC
brian d foy has written an article on pattern matching white space in his blog, The Effective Perler.	[reply]
Re: \s and non-breaking spaces by LanX (Saint) on Feb 29, 2012 at 13:18 UTC
IMHO the simplest (and most generally useful) workaround is to put complex sub-patterns into variables and to use /x-option for readability: `DB<101> $_=" x 3 " => " x 3 " DB<102> $s='[\s\xA0]' => "[\\s\\xA0]" DB<103> s/$s (x) $s/$1/x => 1 DB<104> $_ => "x3 "` [download] Cheers Rolf	[reply] [d/l]