Looks like I spoke too soon.

For anyone else attempting something similar, the above code had a flaw illustrated by the last line I added to the $doc test string below. The solution I came up with essentially amounts to using a more benign format for the temporary replacement string used to hide ASP code from HTML::Parser:

#!/usr/bin/perl use HTML::PullParser; use HTML::Entities; my $doc = <<'EOF'; my %options = (); <input value="abc" /> abc &nbsp; abc <% abc ( '<input value="abc" /> abc &nbsp; abc <span> %\>', $abc ); %> <input value="<%= $abc %>" abc /> EOF $doc =~ s/<%(.*?)%>/ my $content = $1; HTML::Entities::encode_entities ( $content ); qq'[[asp_pp"$content"asp_pp]]' /gsex; foreach ( qw{ text default } ) { $options{$_} = "event, text, is_cdata"; } my $p = HTML::PullParser->new ( doc => $doc, %options ); my $output = ""; while ( my $token = $p->get_token() ) { my $text = $token->[1]; $text =~ s/(\[\[asp_pp"[^"]+"asp_pp\]\])|(abc)/$1?$1:"<b>$2<\/b>"/gs +e if $token->[0] eq 'text' and ! $token->[2]; $output .= $text; } $output =~ s/\[\[asp_pp"([^"]+)"asp_pp\]\]/ "<%" . HTML::Entities::decode_entities ( $1 ) . "%>"/gse; print $output;
Output:
<input value="abc" /> <b>abc</b> &nbsp; <b>abc</b> <% abc ( '<input value="abc" /> abc &nbsp; abc <span> %>', $abc ); %> <input value="<%= $abc %>" abc />

In reply to Re^3: Parsing ASP files by thewebsi
in thread Parsing ASP files by thewebsi

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.