Link regex

coldfingertips has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Link regex by tachyon (Chancellor) on May 21, 2004 at 00:11 UTC
This is a good reason to use a /ex block to perform the magic. local $/; my $data = <DATA>; $data =~ s{\[([^\[])\]} { my @bits = split '\\|', $1; @bits= map{ s/^\s+\|\s+$//g; $_}grep{ ! m/^\s$/ } @bits; if ( @bits == 0 ) { # error handling just return original qq![$1]!; } else { # may want to auto add $bits[0] = "http://$bits[0]" unless $bits[0] =~ m/^(?:https?\|ftp)/ or $bits[0] =~ m/[a-z][A-Z]\|[A-Z][a-z]/; #camelCa +se if ( @bits == 1 ) { qq!<a href="$bits[0]">$bits[0]</a>!; } elsif ( @bits == 2 ) { qq!<a href="$bits[0]">$bits[1]</a>!; } elsif ( @bits == 3 ) { qq!<a href="$bits[0]" target="$bits[2]">$bits[1]</a>!; } } }gex; print $data; __DATA__ This is an error [] This is OK [www.perlmonks.org] This is OK [www1.test.com\|test page1] This is OK [www2.test.com\|test page2\|new] This is errorish [www2.test.com\|test page2\|] This is errorish [ www2.test.com \| test page2 \| ] This is camelCase [ camelCase\|Local Link in Blog \| new ] [download] This allows you to handle errors gracefully (ie tolerate whitespace errors) and say add http:// to apparently external links and ignore for camelCase links (assuming this is for a blog) ie: `This is an error [] This is OK <a href="http://www.perlmonks.org">http://www.perlmonks.org +</a> This is OK <a href="http://www1.test.com">test page1</a> This is OK <a href="http://www2.test.com" target="new">test page2</a> This is errorish <a href="http://www2.test.com">test page2</a> This is errorish <a href="http://www2.test.com">test page2</a> This is camelCase <a href="camelCase" target="new">Local Link in Blog< +/a>` [download] cheers tachyon	[reply] [d/l] [select]
Re: Link regex by injunjoel (Priest) on May 20, 2004 at 23:51 UTC
Greetings all, Quick and marginally tested solution. you have been warned. #!/usr/bin/perl -w use strict; #set up the test data my @potential_links = ('[url\|desc\|target]','[www.test.com\|test page]', +'[www.test.com\|test page\|new]'); #using the test data foreach(@potential_links){ #strip out the braces $_ =~ s/\[\|\]//g; #split the remaining string represented by $_ on the pipe my($url, $desc, $target) = split /\\|/; #ternary to test if $target was set if not print the non-target vers +ion. ($target) ? print qq<a href="$url" target="$target">$desc</a>\n : +print qq<a href="$url">$desc</a>\n ; } exit; [download] Is that close to what you are looking for? -injunjoel	[reply] [d/l]
Re: Link regex by saintmike (Vicar) on May 20, 2004 at 23:53 UTC
`s/\[(.?)\\|(.?)(?:\\|(.*?))?\]/ "<a href=\"$1\"" . ($3 ? " target=\"$3\"" : "") . ">$2<\/a>" /exg;` [download]	[reply] [d/l]
Re: Link regex by dimar (Curate) on May 21, 2004 at 06:09 UTC
... and yet another way to do it, this time using a subroutine instead of putting everything inside a single RegEx (added readability for those who like that kinda coding style). my @aTests = (<DATA>); for (@aTests){ s/\]\|\[//gm; ### strip off sq brackets s/^\s\|\s$//gm; ### trim whitespace ### put fields into array ref and spit out href print &spitHref([split /\s\\|\s/,$_]); print "\n-----\n"; } sub spitHref(){ my $aFields = shift \|\| die"required array ref missing"; my $iFlds = scalar @{$aFields}; return ($iFlds == 0) ? "ERROR: bad data" : ($iFlds == 1) ? qq^<a href="$aFields->[0]">$aFields->[0]</a>^ : ($iFlds == 2) ? qq^<a href="$aFields->[0]">$aFields->[1]</a>^ : ($iFlds == 3) ? qq^<a href="$aFields->[0]" target="$aFields->[2] +">$aFields->[1]</a>^ : 'ERROR: unexpected data' ; }###end_sub __DATA__ [] [www.perlmonks.org] [www1.test.com\|test page1] [www2.test.com\|test page2\|new] [www3.test.com\|test page3\|] [ www4.test.com \| test page4 \| ] [ camelCase\|Local Link in Blog \| new ] [download]	[reply] [d/l]
Re: Re: Link regex by tachyon (Chancellor) on May 21, 2004 at 12:51 UTC
There is a significant difference between the code I presented above an this. While the tests were a simple one widget per line I joined the lines and included freeform text. You don't understand the problem if you think you are going to get nice neat `[...]` widgets in an array. An RE as presented will process a text stream ie a blog page. Your code won't. cheers tachyon	[reply] [d/l]
Re: Re: Re: Link regex by dimar (Curate) on May 21, 2004 at 16:08 UTC
Although I did use test data almost identical to yours, any similarity between the code I posted and the code you posted is purely coincidental. It's supposed to be different, otherwise why post it? That's the whole point. Please note I was responding to the OP, which did not specify a problem domain. Hats off to you for the additional features and assumptions you added in for free on his behalf applause ... This brings us to our regularly scheduled standard disclaimer for today... Standard Disclaimer For Posted Code: without warranty regarding claims to completeness, fitness or reliability make no assumptions about the given problem domain, unless explicitly stated in the OP are offered as free-of-charge food-for-thought may not have anything to do with what a reasonable person might expect, and might even be a total red herring, and worth less than the zero Euros you paid for it caveat emptor	[reply]
Re: Re: Re: Link regex by dimar (Curate) on May 21, 2004 at 16:34 UTC
This is a minor philosophical consideration, totally unrelated to the OP, but relevant to tachyon's comment, hence the additional node. If you have heard of the MVC paradigm, you probably are familiar with the basic goals of that programming model. I tend to use a paradigm that is somewhat similar to MVC. Specifically, one of the basic 'rules' is as follows: Always make a clear separation between code that creates data fields (aka the 'M' part), and code that outputs those data fields into a 'fill in the blank' style template (aka the 'V' part). Why? Because it is frequent that (over time) you have to make changes to one, independently of the other. Therefore, someone who follows this style is likely to create separate sections of code (eg separate subroutines) to handle these operations separately. Given this approach, perhaps it becomes evident why someone would segment out code into separate subroutines, and why one subroutine would expect to get its data fields in prefectly segmented 'widgets' for spitting out a template (and a different subroutine would do the proper 'munging' to make sure they are perfectly segmented). Then again perhaps it doesn't become evident. Oh well. FWIW. YMMV. Thanks for coming to the show. Please tip your waitress generously. Have a nice day.	[reply]