r.joseph has asked for the wisdom of the Perl Monks concerning the following question:

I write CGI - alot of it. It's muh job. And so to make that job easier I have written a little template program that is very simple: take the word that is inside the <~...~> tags, and eval it as a varible. The code I use is as simple as this:

while (<TEMPLATE>) { s/<~([^>]*)~>/eval "\$$1"/egi; print; }
Now, I KNOW there is a better way! And you want to know HOW I know there is a better way? Because if I ever have more than one of these cute little tags on a line, very odd things start to happen: sometimes it totally deletes the line, sometimes it only prints one, sometimes little elven creatures crawl out of my computer and start biting at my ankles (that's the worst).

Any way, I was wondering if any of the wisest in the land (that would be you guys) had any ideas on how I might make this little function of mine better, more useful, and LESS BUGGY!! Thanks so much, oh honorable ones.

Thanks much,
R.Joseph

Replies are listed 'Best First'.
Re: Regex and HTML Question
by maverick (Curate) on Dec 20, 2000 at 11:11 UTC
    Check out HTML::Template. It is a HTML templating package that I've used and been quite happy with.

    If you really want to make your own templating system check out HTML::Parser. In it is a subclass called HTML::TokeParser (if I remember correctly) that will cleanly separate all the HTML tags so can just iterate over them with a foreach loop and not have to worry about spacing and the like.

    Regex's to match HTML type tags can be pretty tricky. I'd be tempted to go with a pre-built package that already works correctly.

    /\/\averick

Re: Regex and HTML Question
by repson (Chaplain) on Dec 20, 2000 at 11:21 UTC
    Now first of all I'll calmly say that you should be using the template toolkit or HTML::Template for all you html templates.

    Now here is how I would write that snippet of code if I was crazy enough to try to write my own template system.

    while (<TEMPLATE>) { s/<~([^~]*)~>/$1/eeg; print; }
    This evaluates $1 to produce $varname, then evaluates $varname to produce its contents. I don't see what s///i is doing since I didn't know that any of the characters <>~ could have multiple cases. And because of the [^~] it shouldn't have a problem with multiple instances on the line (yours would read all non >'s on the line including '~', and THEN look for '~>', then backtrack so it would probably still work though but not as quickly or clearly).
Re: Regex and HTML Question
by mirod (Canon) on Dec 20, 2000 at 16:01 UTC

    I would definitely advise you to use a template module (I use Text::Template myself): it will work out-of-the-box, won't have bugs and give you way more features than what you would code yourself. Plus you would not have to craft the kind of regexp you will see below.

    Of course that said, I can't resist the temptation to give you a proper regexp (all solutions so far will fail for <~hash{foo~bar}~>):

    s/<~(([^~]*(~(?!>))?)*)~>/expand( $1)/g

    I am totally unable to explain the regexp clearly, so try it, look at the regexp doc and above all, do yourself a favor and buy a copy of Mastering Regular Expressions in which Jeff Friedl does a great job at explaining this sort of things.

    update: OK let's try to explain it anyway:

    s/<~( ( [^~]* # match anything but a ~ (~(?!>))? # match a ~ NOT followed by a > # the non > char is not captured so # it is still available for matching )* # match this sequence again )/expand($1)/x;

    The regexp captures all characters up to a ~
    if the ~ is followed by a > it stops there
    if the tilda is NOT followed by a > then the optional block is matched, the following (non >) character has not been used so the outer ()* starts with this character and starts matching again until a ~

    Re-update: the expanded regexp was missing a closing paren, I fixed it

Re: Regex and HTML Question
by extremely (Priest) on Dec 20, 2000 at 13:29 UTC
    To save myself nightmares I always worked out of a hash and skipped the extra eval and the dollarsigns... Others already noted your match issue. Me, I was even more careful and set rules on myself. I'm a pretty shifty fellow and I have to keep a close eye on me to keep from starting trouble. =)
    #!/usr/bin/perl -w use strict; my %h = ( name=>"Mark", monk=>"extremely", typo=>"Weaknesses"); while (<DATA>) { s/<~([a-z][a-z0-9]*)~>/$h{$1}/g; print; } __DATA__ [<~monk~>|<~name~>] also wrote it like this: s/<~([a-z][a-z0-9]*)~>/exists $h{$1} ? $h{$1} : "<~$1~>"/eg; to expose <~typos-> and survive "-w" and "use strict". Try both lines!

    --
    $you = new YOU;
    honk() if $you->love(perl)

Re: Regex and HTML Question
by dws (Chancellor) on Dec 20, 2000 at 12:48 UTC
    If you really insist on rolling your own, call a subroutine to get the value to expand, rather than poking directly into the symbol table. You're going to be a lot happier. It'll be easier to debug templates.
    my $debugtemplate = 1;
    
    my %values = (
      foo => "Don't forget: use strict;",
      bar => "-w is your friend"
    );
    
    while ( <TEMPLATE> ) {
        s/<~([^~]*)~>/expand($1)/eg;
        print;
    }
    
    sub expand {
       my $token = shift;
       return $values{$token} if exists $values{$token};
       return $debugtemplate ? "<~$token~>" : "";
    }
    
Re: Regex and HTML Question
by Maclir (Curate) on Dec 21, 2000 at 02:01 UTC