Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Regex for removing Template::Toolkit comments?

by bliako (Monsignor)
on Aug 24, 2018 at 16:49 UTC ( [id://1221039]=perlquestion: print w/replies, xml ) Need Help??

bliako has asked for the wisdom of the Perl Monks concerning the following question:

Venerable Monks,

Has anyone of you come across or can figure out a regex to strip Template::Toolkit comments?

comments are:

[% # this is a comment to the end of line foo = 'bar' %] <p>bw, bliako</p> [%# placing the '#' immediately inside the directive tag comments out the entire directive %]

Replies are listed 'Best First'.
Re: Regex for removing Template::Toolkit comments?
by Corion (Patriarch) on Aug 24, 2018 at 17:04 UTC

    I would do it in a two-way approach, first splitting up the template into literal parts and TT code, and then stripping out the TT comments, maybe:

    use strict; use warnings; use Data::Dumper; $Data::Dumper::Useqq = 1; my $tt = <<'TT'; [% # this is a comment to the end of line foo = 'bar' %] <p>You might have an array in your TT</p> [% foo = bar[5]; %] <p>bw, bliako</p> [%# placing the '#' immediately inside the directive tag comments out the entire directive %] TT my @parts = ($tt =~ /\G( (?:[^\\\[]+) # not a template, not a backsla +sh |(?:[\\].) # an escaped whatever |(?:[\[][^%]) # not a template, [ followed by + whatever |(?:\[%.*?%\]) # within a TT template ) /msgx); @parts = map { s!\s+#.*$!!gm; $_ } # comments up to EOL map { /^\[%#/ ? "" : $_ } # TT comments @parts; warn Dumper \@parts;

    This will not deal well with templates containing code containing a literal %]. So, don't do that.

      thanks.

      Additional problem is the nested comments unless they are forbidden. Wouldn't I be better of converting comment literals to C comment literals (/* */) and using a nested-comment regex from their?

        I don't understand what you mean by "Additional problem is the nested comments unless they are forbidden."? Can you show example input data where my regular expression fails?

        If you have a regular expression for nested C comment literals, I'm quite sure that it can be trivially converted to a regular expression matching nested TT comments by changing /* to [%# and */ to %]. But I really doubt that TT allows for nested comments anyway.

Re: Regex for removing Template::Toolkit comments?
by tybalt89 (Monsignor) on Aug 24, 2018 at 21:05 UTC

    Here's a first try at a little recursive parser that will strip nested items.

    If you have a better test case (or a counter-example) please let me know.

    #!/usr/bin/perl # https://perlmonks.org/?node_id=1221039 use strict; use warnings; $_ = <<END; before [% # this is a comment to the end of line foo = 'bar' %] <p>bw, bliako</p> [%# placing the '#' immediately inside the directive tag comments out the entire directive %] [% outside %] [%# placing the '#' immediately inside the directive tag comments out the entire directive [% inside %] %] after END print stripcomments(); sub stripcomments { my $answer = ''; $answer .= /\G\[\%#/gc ? stripcomments() x 0 : /\G\[\%/gc ? '[%' . stripcomments() =~ s/#.*//gr . '%]' : /\G\%\]/gc ? return $answer : /\G./gcs ? $& : return $answer while 1; }

    Outputs:

    before [% foo = 'bar' %] <p>bw, bliako</p> [% outside %] after

      tybalt89, giveth with one hand and taketh away with the other...

      I am struggling to convert it to a function with input parameter... Getting there (in a biblical sense) ...

        #!/usr/bin/perl # https://perlmonks.org/?node_id=1221039 use strict; use warnings; my $someTTstring = <<END; before [% # this is a comment to the end of line foo = 'bar' %] <p>bw, bliako</p> [%# placing the '#' immediately inside the directive tag comments out the entire directive %] [% outside %] [%# placing the '#' immediately inside the directive tag comments out the entire directive [% inside %] %] after END print stripcomments($someTTstring); sub stripcomments { @_ and local $_ = shift; my $answer = ''; $answer .= /\G\[\%#/gc ? stripcomments() x 0 : /\G\[\%/gc ? '[%' . stripcomments() =~ s/#.*//gr . '%]' : /\G\%\]/gc ? return $answer : /\G./gcs ? $& : return $answer while 1; }

        Like this?

Re: Regex for removing Template::Toolkit comments?
by LanX (Saint) on Aug 24, 2018 at 17:33 UTC
    I never used TT!

    ... but after browsing thru the docs, I wouldn't be surprised if there was a way to process a template to another template, and filter the content by hooking in.

    HTH! :)

    update

    (from another thread) "you could subclass Template::Parser / Template::Directive"

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

      filter the content by hooking in

      Out of curiosity I looked into this a bit, and it turns out that hacking/hooking into Template::Parser (via Template::Directive, Template::Grammar, or even Parser.yp) is difficult, because it looks like Template::Parser::_parse drops the original source text and doesn't pass it into the handlers. But for a first step, all that's needed are the tokens, which can be provided by Template::Parser::split_text... but careful with the following, I haven't tested with a lot of different cases yet to see if there might be token types this doesn't handle.

      #!/usr/bin/env perl use warnings; use strict; use Data::Dump qw/dd pp/; use Template::Parser; my $text = <<'END'; before [% # this is a comment to the end of line foo = 'bar' %] <p>bw, bliako</p> [%# placing the '#' immediately inside the directive tag comments out the entire directive %] [% outside %] after END my $parser = Template::Parser->new(); my $tokens = $parser->split_text($text); #dd $tokens; # Debug my $o = ''; for (my $i=0; $i<@$tokens; $i++) { if (ref $tokens->[$i]) { my $text = $tokens->[$i][0]; #dd $text; # Debug $o .= "[% $text %]"; } elsif ($tokens->[$i] eq 'TEXT') { my $text = $tokens->[++$i]; #dd $text; # Debug $o .= $text; } else { die pp($i,$tokens->[$i]) } } print $o; __END__ before [% # this is a comment to the end of line foo = 'bar' %] <p>bw, bliako</p> [% outside %] after
        Great, i think you just found a reply for Tidy for Template Toolkit Files =)

        Anyway ... please stop doing things "out of curiosity" and concentrate on running Perl in the browser! ;-)

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

        some 2-second later edits below...

        Thanks, your code's great. To do what LanX proposed looked to me too scary(=switch to another task and read their manuals, feel free to downvote my human ingredients). And using regex's is too fragile without knowing the full spec of TT, i.e. are nested comments allowed and how to deal with [% and [%#] and [% # inside strings as Corion said. So let TT do it seems the right way to me.

        Edit2: To be fair to the regex solutions: it was me who asked for a regex in the first place.

      thanks though that looks a big investment. plus I have my own semantics for language switching on top of TT so If I let TT parse that it may freak out ...

        > have my own semantics ... it may freak out

        If TT may freak out, how do you expect us to provide a regex doing it ? :)

        Good luck! ;)

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1221039]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (8)
As of 2024-03-28 18:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found