Perl regex

axl163 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Perl regex by Fletch (Bishop) on May 02, 2006 at 16:51 UTC
A) `tr///` isn't what you want (even if you'd used a syntactically correct trailing slash). `s///` is for substitutions. B) unless you can guarantee a very strict formatting in the HTML you're operating on you don't want to use a regexp to manipulate HTML. Use HTML::TokeParser or HTML::TreeBuilder or the like.	[reply] [d/l] [select]
Re: Perl regex by ikegami (Patriarch) on May 02, 2006 at 16:52 UTC
[ The OP silently updated his question. This post is now obsolete. ] Read more... (1002 Bytes)	[reply] [d/l] [select]
Re: Perl regex by ptum (Priest) on May 02, 2006 at 19:11 UTC
So, let me get this straight. You have a user-entered search phrase and you want to highlight HTML content where it matches those words. First, let me recommend that when you change your node, you mark it as Update: and either use ~~strike~~ notation or simply post your updated material in a separate paragraph, leaving your original post content alone. Second, you want to parse out a search phrase into words and put them in an array -- use split to accomplish this. Third, you'll want to step through that array, using a construct like this (untested): `for my $word (@search_words) { $html =~ s/($word)/<b><u>$1<\/u><\/b>/g; }` [download] This acts on the HTML in $html that you are evaluating and replaces $word with a highlighted version of itself (that's what the $1 accomplishes). It acts on the entire contents of $html because of the /g modifier on the regex. Fourth, if you want to parse out sections or tags of HTML, applying your substitution to some while ignoring others, you'll probably want to use a CPAN module to do that. I've used HTML::TreeBuilder for such things before, but you may want to search around a little for something that suits your needs. Update: Ah, I see that Fletch already recommended this. Well, now you've heard it from two people! :) No good deed goes unpunished. -- (attributed to) Oscar Wilde	[reply] [d/l]
Re: Perl regex by graff (Chancellor) on May 03, 2006 at 01:37 UTC
This is the sort of case where HTML::TokeParser::Simple really is simple: `use strict; use HTML::TokeParser::Simple; use Getopt::Std; my $Usage = "Usage: $0 -f word file.html > highlighted.html\n"; my %opts; ( getopts( 'f:', \%opts ) and $opts{f} and @ARGV == 1 ) or die $Usage; $opts{f} =~ tr/,/\\|/; # This applies a regex substitution to the text part # and leaves the tags unmodified: while ( my $token = $p->get_token ) { if ( $token->is_text ) { $_ = $token->as_is; s{($opts{$f})}{<b><u>$1</u></b>}g; print; } else { print $token->as_is; } }` [download] (update: forgot to include the assignment to $_)	[reply] [d/l]