axl163 has asked for the wisdom of the Perl Monks concerning the following question:

This node falls below the community's threshold of quality. You may see it by logging in.

Replies are listed 'Best First'.
Re: Perl regex
by Fletch (Bishop) on May 02, 2006 at 16:51 UTC

    A) tr/// isn't what you want (even if you'd used a syntactically correct trailing slash). s/// is for substitutions.

    B) unless you can guarantee a very strict formatting in the HTML you're operating on you don't want to use a regexp to manipulate HTML. Use HTML::TokeParser or HTML::TreeBuilder or the like.

Re: Perl regex
by ikegami (Patriarch) on May 02, 2006 at 16:52 UTC

    [ The OP silently updated his question. This post is now obsolete. ]

Re: Perl regex
by ptum (Priest) on May 02, 2006 at 19:11 UTC

    So, let me get this straight. You have a user-entered search phrase and you want to highlight HTML content where it matches those words.

    First, let me recommend that when you change your node, you mark it as Update: and either use strike notation or simply post your updated material in a separate paragraph, leaving your original post content alone.

    Second, you want to parse out a search phrase into words and put them in an array -- use split to accomplish this.

    Third, you'll want to step through that array, using a construct like this (untested):

    for my $word (@search_words) { $html =~ s/($word)/<b><u>$1<\/u><\/b>/g; }

    This acts on the HTML in $html that you are evaluating and replaces $word with a highlighted version of itself (that's what the $1 accomplishes). It acts on the entire contents of $html because of the /g modifier on the regex.

    Fourth, if you want to parse out sections or tags of HTML, applying your substitution to some while ignoring others, you'll probably want to use a CPAN module to do that. I've used HTML::TreeBuilder for such things before, but you may want to search around a little for something that suits your needs.

    Update: Ah, I see that Fletch already recommended this. Well, now you've heard it from two people! :)


    No good deed goes unpunished. -- (attributed to) Oscar Wilde
Re: Perl regex
by graff (Chancellor) on May 03, 2006 at 01:37 UTC
    This is the sort of case where HTML::TokeParser::Simple really is simple:
    use strict; use HTML::TokeParser::Simple; use Getopt::Std; my $Usage = "Usage: $0 -f word file.html > highlighted.html\n"; my %opts; ( getopts( 'f:', \%opts ) and $opts{f} and @ARGV == 1 ) or die $Usage; $opts{f} =~ tr/,/\|/; # This applies a regex substitution to the text part # and leaves the tags unmodified: while ( my $token = $p->get_token ) { if ( $token->is_text ) { $_ = $token->as_is; s{($opts{$f})}{<b><u>$1</u></b>}g; print; } else { print $token->as_is; } }

    (update: forgot to include the assignment to $_)