Ah well. Why do you use such a complicated templating in the first place? Reach out for Template Toolkit, HTML::Template or some such.

But if you absolutely have to stick with such a beast, don't even try to use regexes to transform your template! See Why this simple regex freeze my computer? for an example of horrors you might run into with that approach.

You could use HTML::Parser to achieve what you want. That module tokenizes your HTML and provides you with callbacks for comments, opening tags, closing tags, plain text and much more. In those callbacks, you can track the state of your opening/closing tags depending on whether there's content found to be substituted.

But first, some sanitizing:

<!-- headerStandard.siteFeedbackLink.label --!>

should be

<!-- headerStandard.siteFeedbackLink.label -->

to be a well formed comment. But if you stick with that, remove at least the last '!' to make your (invalid) comment pairs into one comment:

<!-- foo.label --!>text<!-- foo.label -->
I've cranked out an example for a starter, which does the job for the examples given, but has its rough edges and doesn't treat nested stuff well, e.g
<a href="<!--foo.label --!>foo<!-- foo.label -->"> blah blah <img src="bar.jpg" alt="<!-- bar.label --!>bar<!-- bar.lab +el -->" /> </a>

which can be solved using a stack of replacement links and have $pending below as a pointer to it. But you should really, really switch to a seasoned templating system!

use HTML::Parser; use warnings; use strict; my $p = HTML::Parser->new( api_version => 3, start_h => [\&start, 'tagname, attr, attrseq, text'], end_h => [\&end, 'tagname, text'], comment_h => [\&comm, 'text' ], default_h => [ sub {print shift}, 'text'], ); $p->unbroken_text(1); my $file = shift; $p->parse_file($file); my ($pending, $link); sub start { my($tag, $attr, $attrseq, $text) = @_; for my $k (keys %$attr) { if ($attr->{$k} =~ /\!/) { ($attr->{$k},$link) = transform_comments($attr->{$k}); } } $pending++; my $a = join ' ', map { $_ eq '/' ? $_ : "$_=\"$attr->{$_}\"" } @$attrseq; print "<$tag", $a ? " $a>" : '>'; } sub end { my ($tag,$text) = @_; print $text; if ($pending) { print $link; $pending = $link = ''; } } sub comm { my $text; ($text,$link) = transform_comments($_[0]); print $text; print $link unless $pending; } sub transform_comments { my $str = shift; if ($str =~ /(\S+) --!>([^<]+?)<!-- (\1)/) { my ($key,$text) = ($1,$2); # return value of hash and link my $val = "fake-$text-translated"; my $link = '<a href=\'foo\'>foo</a>'; return $val,$link; } $str; }

Update: seeing your answer to moritz above - well, then at least it isn't your fault ;-)

For the above code to work, you'll need to sanitize the comments with e.g.

perl -pi -e 's/(<!-- \S+ --!>[^<]+?<!-- \S+ --)!>/$1>/g' $file

since HTML::Parser won't recognize the comments otherwise.


In reply to Re: Regex within html by shmem
in thread Regex within html by ropey

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.