(jeffa) Re: Monks' Expression
by jeffa (Bishop) on Jan 21, 2001 at 22:44 UTC
|
my $str = '[id=4590]blahblah[/id]';
$str =~ m/^\[id=(\d+)\](.+)\[/;
print '<a href="/cgi-bin/coolcode.pl?id=' . "$1>$2</a>\n";
or . . .
$str =~ s{^\[id=(\d+)\](.+)\[/id]}{<a href="/cgi-bin/coolcode.pl?id=$1
+>$2</a>};
print "$str\n";
I should mention that you probably will want to take out
the carret ^ in both regex's - that tells the engine to
start the match at the BEGINNING of the string. Since you
are probably going to be extracting this from somewhere in
the middle of the document - you don't want this.
Also, don't forget to add the 'g' modifier if you want
to get multiple occurrences of the regex.
Update: Big thanks to kudra for reminding me to add
$2 !!
Jeff
L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
F--F--F--F--F--F--F--F--
(the triplet paradiddle)
| [reply] [d/l] [select] |
|
$str =~ s/\[id=(\d+)\]/<a href='...$1'>/g;
$str =~ s/\[\/id\]/<\/a>/g;
print $str;
This will deal more gracefully with links that span multiple lines (which often happens when people write in a auto-wrapping editor.)
| [reply] [d/l] |
|
This regex won't work if you have more than one id tag. the reason is the greedy behavior of .+ This will get everything from the very first opening tag to the last closing tag.
When's the last time you used duct tape on a duct? --Larry Wall
| [reply] [d/l] |
Re: Monks' Expression
by sierrathedog04 (Hermit) on Jan 22, 2001 at 02:23 UTC
|
My regular expression should take [id=###]blahblah[/id] and replace it with <a href="/cgi-bin/coolcode.pl?id=###">blahblah</a>.
The following should work:
my $text = "[id=###]blahblah[/id];
my $targetURL = "/cgi-bin/coolcode.pl?";
$text =~ s/.*[(id=.*)](.*)[\/id]/<a href=\"$targetURL\1\">\2<\/a>/gs;
- The backslash character escapes the quote and forward slash characters to make sure they are inserted literally into the target expression.
- the expressions between parentheses are stored as \1 and \2.
- the gs option ignores line breaks in the source and treats spaces as characters.
| [reply] [d/l] [select] |
|
$text =~ s/\[(id=.*?)](.*?)\[\/id]/<a href=\"$targetURL\1\">\2<\/a>/gs;
| [reply] [d/l] |
|
$text =~ s#\[(id=[^\]]*)]([^\[]*)\[/id]#<a href="$targetURL$1">$2</a>#gs;
| [reply] [d/l] |
|
|
|
|
I want to thank I0 for correcting some bugs:-
'[' is apparently a reserved character in Perl REs so it needed to be escaped.
- The ? in .*? turns off greedy matching. Thus everything after id= will be matched but only up until the first '[' it encounters.
| [reply] |
Re: Monks' Expression
by Segfault (Scribe) on Jan 22, 2001 at 11:06 UTC
|
I've only got a moderate amount of experience working with perl, so my coding isn't always the best, but here's one approach to it...
my $text = "Here is [id=327]a very cool document[/id] for you";
$text =~ /\[id=([0-9]*)\](.*)\[\/id\]/i;
my $tag = "<A HREF=\"/cgi-bin/coolcode.pl?id=$1\">$2</A>";
my $before = $`;
my $after = $';
print "$before$tag$after\n";
If you actually put this in a file and run it with Perl, you'll get:
Here is <A HREF="/cgi-bin/coolcode.pl?id=327">a very cool document</A>
+ for you
Just as you'd expect and (hopefully) want. :-) | [reply] [d/l] [select] |
|
Interesting. As has been mentioned before, .* here is bad because it's greedy and will prevent you from finding two matches on a line. Furthermore, .* is usually pretty inefficient -- the regexp has to match everything to the end of the line and then start backtracking to try to match whatever follows the .*. (.*? is better but still less than ideal to because it needs to look ahead one character in every step. If '[' is only allowed in tags, then [^[]* is the best.)
Also, why use $before and $after rather than $` and $' directly? Of course, if at all possible, you shouldn't be using $` and $' at all (see Why does using $&, $`, or $' slow my program down? in perlfaq6).
Cheers, mate!
| [reply] |
|
I used $before and $after because often, I've found, I end up using multiple regexps in one block of code, and I need to easily get back to those values from prior matches.
Thanks for the tips though, I was hoping for some advice on my regexps. :-)
| [reply] |
(jptxs)Re: Monks' Expression
by jptxs (Curate) on Jan 22, 2001 at 20:22 UTC
|
I'm a little late here, but I'd just like to add that when I find myself thinking about using regexen to process web content that I really want to be using a template system of some kind. That may not be the case for you, but almost always is for me. See HTML::Template or many others for some good starting points. Also try searching on 'template toolkit' here in the monastery for some good discussion on this too.
"A man's maturity -- consists in having found again the
seriousness one had as a child, at play." --Nietzsche
| [reply] [d/l] |