Ah well. Why do you use such a complicated templating in the first place? Reach out for Template Toolkit, HTML::Template or some such.
But if you absolutely have to stick with such a beast, don't even try to use regexes to transform your template! See Why this simple regex freeze my computer? for an example of horrors you might run into with that approach.You could use HTML::Parser to achieve what you want. That module tokenizes your HTML and provides you with callbacks for comments, opening tags, closing tags, plain text and much more. In those callbacks, you can track the state of your opening/closing tags depending on whether there's content found to be substituted.
But first, some sanitizing:
<!-- headerStandard.siteFeedbackLink.label --!>
should be
<!-- headerStandard.siteFeedbackLink.label -->
to be a well formed comment. But if you stick with that, remove at least the last '!' to make your (invalid) comment pairs into one comment:
I've cranked out an example for a starter, which does the job for the examples given, but has its rough edges and doesn't treat nested stuff well, e.g<!-- foo.label --!>text<!-- foo.label -->
<a href="<!--foo.label --!>foo<!-- foo.label -->"> blah blah <img src="bar.jpg" alt="<!-- bar.label --!>bar<!-- bar.lab +el -->" /> </a>
which can be solved using a stack of replacement links and have $pending below as a pointer to it. But you should really, really switch to a seasoned templating system!
use HTML::Parser; use warnings; use strict; my $p = HTML::Parser->new( api_version => 3, start_h => [\&start, 'tagname, attr, attrseq, text'], end_h => [\&end, 'tagname, text'], comment_h => [\&comm, 'text' ], default_h => [ sub {print shift}, 'text'], ); $p->unbroken_text(1); my $file = shift; $p->parse_file($file); my ($pending, $link); sub start { my($tag, $attr, $attrseq, $text) = @_; for my $k (keys %$attr) { if ($attr->{$k} =~ /\!/) { ($attr->{$k},$link) = transform_comments($attr->{$k}); } } $pending++; my $a = join ' ', map { $_ eq '/' ? $_ : "$_=\"$attr->{$_}\"" } @$attrseq; print "<$tag", $a ? " $a>" : '>'; } sub end { my ($tag,$text) = @_; print $text; if ($pending) { print $link; $pending = $link = ''; } } sub comm { my $text; ($text,$link) = transform_comments($_[0]); print $text; print $link unless $pending; } sub transform_comments { my $str = shift; if ($str =~ /(\S+) --!>([^<]+?)<!-- (\1)/) { my ($key,$text) = ($1,$2); # return value of hash and link my $val = "fake-$text-translated"; my $link = '<a href=\'foo\'>foo</a>'; return $val,$link; } $str; }
Update: seeing your answer to moritz above - well, then at least it isn't your fault ;-)
For the above code to work, you'll need to sanitize the comments with e.g.
perl -pi -e 's/(<!-- \S+ --!>[^<]+?<!-- \S+ --)!>/$1>/g' $file
since HTML::Parser won't recognize the comments otherwise.
In reply to Re: Regex within html
by shmem
in thread Regex within html
by ropey
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |