Monks' Expression

sinan has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
(jeffa) Re: Monks' Expression by jeffa (Bishop) on Jan 21, 2001 at 22:44 UTC
How about: `my $str = '[id=4590]blahblah[/id]'; $str =~ m/^\[id=(\d+)\](.+)\[/; print '<a href="/cgi-bin/coolcode.pl?id=' . "$1>$2</a>\n";` [download] or . . . `$str =~ s{^\[id=(\d+)\](.+)\[/id]}{<a href="/cgi-bin/coolcode.pl?id=$1 +>$2</a>}; print "$str\n";` [download] I should mention that you probably will want to take out the carret ^ in both regex's - that tells the engine to start the match at the BEGINNING of the string. Since you are probably going to be extracting this from somewhere in the middle of the document - you don't want this. Also, don't forget to add the 'g' modifier if you want to get multiple occurrences of the regex. Update: Big thanks to kudra for reminding me to add $2 !! Jeff L-LL-L--L-LL-L--L-LL-L-- -R--R-RR-R--R-RR-R--R-RR F--F--F--F--F--F--F--F-- (the triplet paradiddle)	[reply] [d/l] [select]
Re: Re: Monks' Expression by eg (Friar) on Jan 22, 2001 at 00:07 UTC
Instead of trying to do everything with a single regular expression, consider breaking it up into two. `$str =~ s/\[id=(\d+)\]/<a href='...$1'>/g; $str =~ s/\[\/id\]/<\/a>/g; print $str;` [download] This will deal more gracefully with links that span multiple lines (which often happens when people write in a auto-wrapping editor.)	[reply] [d/l]
Re: Re: Monks' Expression by ColonelPanic (Friar) on Jan 22, 2001 at 04:20 UTC
This regex won't work if you have more than one id tag. the reason is the greedy behavior of `.+` This will get everything from the very first opening tag to the last closing tag. When's the last time you used duct tape on a duct? --Larry Wall	[reply] [d/l]
Re: Monks' Expression by sierrathedog04 (Hermit) on Jan 22, 2001 at 02:23 UTC
My regular expression should take `[id=###]blahblah[/id] and replace it with <a href="/cgi-bin/coolcode.pl?id=###">blahblah</a>`. The following should work: `my $text = "[id=###]blahblah[/id]; my $targetURL = "/cgi-bin/coolcode.pl?"; $text =~ s/.[(id=.)](.*)[\/id]/<a href=\"$targetURL\1\">\2<\/a>/gs;` [download] The backslash character escapes the quote and forward slash characters to make sure they are inserted literally into the target expression. the expressions between parentheses are stored as \1 and \2. the gs option ignores line breaks in the source and treats spaces as characters.	[reply] [d/l] [select]
Re: Re: Monks' Expression by I0 (Priest) on Jan 22, 2001 at 02:43 UTC
`$text =~ s/\[(id=.?)](.?)\[\/id]/<a href=\"$targetURL\1\">\2<\/a>/gs;`	[reply] [d/l]
Re: Re: Re: Monks' Expression by repson (Chaplain) on Jan 22, 2001 at 04:18 UTC
`$text =~ s#\[(id=[^\]])]([^\[])\[/id]#<a href="$targetURL$1">$2</a>#gs;`	[reply] [d/l]
Re: Re: Re: Re: Monks' Expression by I0 (Priest) on Jan 22, 2001 at 06:58 UTC
Re: Re: Re: Re: Re: Monks' Expression by merlyn (Sage) on Jan 22, 2001 at 07:00 UTC
Some notes below your chosen depth have not been shown here
Re: Re: Re: Monks' Expression by sierrathedog04 (Hermit) on Jan 22, 2001 at 03:12 UTC
I want to thank I0 for correcting some bugs: '[' is apparently a reserved character in Perl REs so it needed to be escaped. The ? in .*? turns off greedy matching. Thus everything after id= will be matched but only up until the first '[' it encounters.	[reply]
Re: Monks' Expression by Segfault (Scribe) on Jan 22, 2001 at 11:06 UTC
I've only got a moderate amount of experience working with perl, so my coding isn't always the best, but here's one approach to it... my $text = "Here is [id=327]a very cool document[/id] for you"; $text =~ /\[id=([0-9])\](.)\[\/id\]/i; my $tag = "<A HREF=\"/cgi-bin/coolcode.pl?id=$1\">$2</A>"; my $before = $`; my $after = $'; print "$before$tag$after\n"; [download] If you actually put this in a file and run it with Perl, you'll get: `Here is <A HREF="/cgi-bin/coolcode.pl?id=327">a very cool document</A> + for you` [download] Just as you'd expect and (hopefully) want. :-)	[reply] [d/l] [select]
Re: Re: Monks' Expression by eg (Friar) on Jan 22, 2001 at 13:34 UTC
Interesting. As has been mentioned before, `.` here is bad because it's greedy and will prevent you from finding two matches on a line. Furthermore, `.` is usually pretty inefficient -- the regexp has to match everything to the end of the line and then start backtracking to try to match whatever follows the `.`. (`.?` is better but still less than ideal to because it needs to look ahead one character in every step. If '`[`' is only allowed in tags, then `[^[]*` is the best.) Also, why use `$before` and `$after` rather than $` and `$'` directly? Of course, if at all possible, you shouldn't be using $` and `$'` at all (see Why does using $&, $`, or $' slow my program down? in perlfaq6). Cheers, mate!	[reply]
Re: Re: Re: Monks' Expression by Segfault (Scribe) on Jan 22, 2001 at 23:51 UTC
I used $before and $after because often, I've found, I end up using multiple regexps in one block of code, and I need to easily get back to those values from prior matches. Thanks for the tips though, I was hoping for some advice on my regexps. :-)	[reply]
(jptxs)Re: Monks' Expression by jptxs (Curate) on Jan 22, 2001 at 20:22 UTC
I'm a little late here, but I'd just like to add that when I find myself thinking about using regexen to process web content that I really want to be using a template system of some kind. That may not be the case for you, but almost always is for me. See HTML::Template or many others for some good starting points. Also try searching on 'template toolkit' here in the monastery for some good discussion on this too. `"A man's maturity -- consists in having found again the seriousness one had as a child, at play." --Nietzsche` [download]	[reply] [d/l]