Regex and HTML Question

r.joseph has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Regex and HTML Question by maverick (Curate) on Dec 20, 2000 at 11:11 UTC
Check out HTML::Template. It is a HTML templating package that I've used and been quite happy with. If you really want to make your own templating system check out HTML::Parser. In it is a subclass called HTML::TokeParser (if I remember correctly) that will cleanly separate all the HTML tags so can just iterate over them with a foreach loop and not have to worry about spacing and the like. Regex's to match HTML type tags can be pretty tricky. I'd be tempted to go with a pre-built package that already works correctly. /\/\averick	[reply]
Re: Regex and HTML Question by repson (Chaplain) on Dec 20, 2000 at 11:21 UTC
Now first of all I'll calmly say that you should be using the template toolkit or HTML::Template for all you html templates. Now here is how I would write that snippet of code if I was crazy enough to try to write my own template system. `while (<TEMPLATE>) { s/<~([^~]*)~>/$1/eeg; print; }` [download] This evaluates $1 to produce $varname, then evaluates $varname to produce its contents. I don't see what s///i is doing since I didn't know that any of the characters <>~ could have multiple cases. And because of the `[^~]` it shouldn't have a problem with multiple instances on the line (yours would read all non >'s on the line including '~', and THEN look for '~>', then backtrack so it would probably still work though but not as quickly or clearly).	[reply] [d/l] [select]
Re: Regex and HTML Question by mirod (Canon) on Dec 20, 2000 at 16:01 UTC
I would definitely advise you to use a template module (I use Text::Template myself): it will work out-of-the-box, won't have bugs and give you way more features than what you would code yourself. Plus you would not have to craft the kind of regexp you will see below. Of course that said, I can't resist the temptation to give you a proper regexp (all solutions so far will fail for <~hash{foo~bar}~>): `s/<~(([^~](~(?!>))?))~>/expand( $1)/g` I am totally unable to explain the regexp clearly, so try it, look at the regexp doc and above all, do yourself a favor and buy a copy of Mastering Regular Expressions in which Jeff Friedl does a great job at explaining this sort of things. update: OK let's try to explain it anyway: `s/<~( ( [^~]* # match anything but a ~ (~(?!>))? # match a ~ NOT followed by a > # the non > char is not captured so # it is still available for matching )* # match this sequence again )/expand($1)/x;` [download] The regexp captures all characters up to a ~ if the ~ is followed by a > it stops there if the tilda is NOT followed by a > then the optional block is matched, the following (non >) character has not been used so the outer ()* starts with this character and starts matching again until a ~ Re-update: the expanded regexp was missing a closing paren, I fixed it	[reply] [d/l] [select]
Re: Regex and HTML Question by extremely (Priest) on Dec 20, 2000 at 13:29 UTC
To save myself nightmares I always worked out of a hash and skipped the extra eval and the dollarsigns... Others already noted your match issue. Me, I was even more careful and set rules on myself. I'm a pretty shifty fellow and I have to keep a close eye on me to keep from starting trouble. =) `#!/usr/bin/perl -w use strict; my %h = ( name=>"Mark", monk=>"extremely", typo=>"Weaknesses"); while (<DATA>) { s/<~([a-z][a-z0-9])~>/$h{$1}/g; print; } __DATA__ [<~monk~>\|<~name~>] also wrote it like this: s/<~([a-z][a-z0-9])~>/exists $h{$1} ? $h{$1} : "<~$1~>"/eg; to expose <~typos-> and survive "-w" and "use strict". Try both lines!` [download] -- $you = new YOU; honk() if $you->love(perl)	[reply] [d/l]
Re: Regex and HTML Question by dws (Chancellor) on Dec 20, 2000 at 12:48 UTC
If you really insist on rolling your own, call a subroutine to get the value to expand, rather than poking directly into the symbol table. You're going to be a lot happier. It'll be easier to debug templates. my $debugtemplate = 1; my %values = ( foo => "Don't forget: use strict;", bar => "-w is your friend" ); while ( <TEMPLATE> ) { s/<~([^~]*)~>/expand($1)/eg; print; } sub expand { my $token = shift; return $values{$token} if exists $values{$token}; return $debugtemplate ? "<~$token~>" : ""; }	[reply]
Re: Regex and HTML Question by Maclir (Curate) on Dec 21, 2000 at 02:01 UTC
An alternative to HTML::Template is Embperl - try the following location: http://perl.apache.org/embperl/index.html A slightly different purpose, but worth looking at.	[reply]