in reply to getting the first n printable words from a string of HTML
*UPDATE* It occurred to me that my code will cause a runtime error or weird behaviour if any element in @list contains regex metachars as these will be interpolated into the eat it up regex. To fix this we need to escape all these chars. Here is the patched code.
tachyon
my $html = "<h1>F(oo</h1><p>Bar</p><p>Some more text here</p>"; my @list = ('F(oo','Bar'); # you need to make the elements in @list regex # friendly by backslashing all the metachars # comment out this line to see this script choke # on the ( in F(oo s/([\$\^\*\(\)\+\{\[\\\|\.\?])/\\$1/g for @list; # eat up the bits in @list $html =~ m/$_/gc for @list; #use \G to match the rest ($rest) = $html =~ m/\G(.*)$/; print $rest;
*Update* added \) which slipped throught the net. Caught by chipmunk. chipmunk also points out that quotemeta is a good solution but my pride won't allow me to use it because it is both shorter and more elegant!
# s/([\$\^\*\(\)\+\{\[\\\|\.\?])/\\$1/g for @list; $_ = quotemeta $_ for @list;
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: getting the first n printable words from a string of HTML
by kiz (Monk) on May 30, 2001 at 19:04 UTC |