Ratazong has asked for the wisdom of the Perl Monks concerning the following question:
Dear Monks,
today I have played around a bit with HTML::Tree and its parts. And was surprised that the following expression was not working (as intended) on a string extracted from a webpage using as_trimmed_text().
After a lot of searching I found the culprit: the blanks were coded as 0xA0 (non-breaking spaces). And \s is not matching them. Is there a better way to handle those besides my rather ugly solution below?$name =~ s/\sx\s\d+//; # remove trailing " x 3" (and similar)
Or another workaround?$name =~ s/[\s\xA0]x[\s\xA0]\d+//;
Rata
Update: Thanks a lot rovf, Eliya and LanX: I added all of your solutions to my code for future reference - all work like a charm :-) Now that I installed v5.14, I will keep the solution with use feature "unicode_strings"; active - I like its elegance.
And thanks for the link to the interesting blog-post, shawnhcorey!
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: \s and non-breaking spaces
by Eliya (Vicar) on Feb 29, 2012 at 13:15 UTC | |
|
Re: \s and non-breaking spaces
by rovf (Priest) on Feb 29, 2012 at 13:14 UTC | |
|
Re: \s and non-breaking spaces
by shawnhcorey (Friar) on Feb 29, 2012 at 14:57 UTC | |
|
Re: \s and non-breaking spaces
by LanX (Saint) on Feb 29, 2012 at 13:18 UTC |