Re: Regex to Truncate URLs Nicely
by fruiture (Curate) on Oct 31, 2002 at 23:49 UTC
|
Check out the URI module to do it correctly, i think that's th easiest way to parse an URL and modify it wisely.
--
http://fruiture.de
| [reply] |
A reply falls below the community's threshold of quality. You may see it by logging in. |
Re: Regex to Truncate URLs Nicely
by Enlil (Parson) on Nov 01, 2002 at 00:15 UTC
|
Since I would now guess this is an "academic" endevour. I will give you the way I would approach it, without modules. First I would use two regexes. The first to reduce something like:
http://some-shop.com/dir1/dir2/buystuff.cgi?x=1&y=2&z=3 to something like http://some-shop.com/(...)/buystuff.cgi?x=1&y=2&z=3 and the second regex to remove anything at the end if there is a long query string at the end. But only doing anything if the URL is over 50 chars.(then again I might just use a couple of splits and some concatenation magic instead, but that would depend on what all my data looked like.) Good Luck. -enlil | [reply] |
|
|
| [reply] |
Re: Regex to Truncate URLs Nicely
by Cody Pendant (Prior) on Nov 01, 2002 at 10:57 UTC
|
($_='jjjuuusssttt annootthheer
pppeeerrrlll haaaccckkeer')=~y/a-z//s;print;
| [reply] [d/l] |
Re: Regex to Truncate URLs Nicely
by Revelation (Deacon) on Nov 01, 2002 at 00:40 UTC
|
print url_parse('http://www.moo.com/moo.cgi?moo=moo');
sub url_parse {
my $url = shift;
$url =~ m!(http:\/\/[^\/]+)!gis;
my $base = $1;
( my $directorystruct = $url ) =~ s!$1!!gis;
my ( undef, @directories ) = split /\//, $directorystruct;
my $tnum = $#directories;
$directories[$tnum] =~ s/(.*)\?.*/$1/gis;
return $base . '/' . $directories[$tnum] if scalar(@directories) <
+= 1;
return $base . '/../' . $directories[$tnum];
}
Gyan Kapur
gyan.kapur@rhhllp.com
| [reply] [d/l] |
Re: Regex to Truncate URLs Nicely
by Wonko the sane (Curate) on Nov 01, 2002 at 01:23 UTC
|
$url =~ s!^(https?://.*?/)(?:.{20}.*)?(/[^?]*)(\?.*)*!$1(..)$2!
Works on urls with or without args on the end, the 20 in the middle can be adjusted to fit whatever url you mostly encounter.
Best Regards,
Wonko | [reply] [d/l] |
Re: Regex to Truncate URLs Nicely
by artist (Parson) on Nov 01, 2002 at 03:16 UTC
|
Hi,
Mine is not 100% perl solution. If You may be able to use external services like Tiny URL which can shorten the URL itself, can help for the underlying link. The one on the display can be shorten by Website names etc. or as per methods mentioned by other monks here.
Appreciating the Tiny Art,
Artist
| [reply] |
Re: Regex to Truncate URLs Nicely
by Aristotle (Chancellor) on Nov 02, 2002 at 08:14 UTC
|
my $maxlen = 35;
s![?].*$!!; # chop query params if any
s{^(.*)(?=/[^/]/?)}{length $1 < $maxlen ? $1 : substr($1, 0, $maxlen-3
+)."..."}e;
Makeshifts last the longest. | [reply] [d/l] |
|
|
Neat, thank you.
One question:
s![?].*$!!;
Why is the query in brackets there?
--
($_='jjjuuusssttt annootthheer
pppeeerrrlll haaaccckkeer')=~y/a-z//s;print;
| [reply] [d/l] [select] |
|
|
| [reply] [d/l] |
use split?
by cebrown (Pilgrim) on Nov 01, 2002 at 00:19 UTC
|
I'm just about to brave the rush hour, so can't post code, but I would suggest using split on "/" instead of a regex.The first few items in the split list will make up the front of the URI, and the last one can be split again on "?" to knock off the query parameters. | [reply] [d/l] |
|
|
http://host.com/some/uri/whatever?some/query/string
--
http://fruiture.de | [reply] [d/l] |
|
|
The problem with either method is that there are special cases which one might miss unless they understand exactly what a URL might look like (or for that case any data you have to parse through). Personally, I would use a module if someone has already taken the time to do the leg work of what specifications an URL has to meet. When I initially coded up a regex for this, and then didn't post it because I don't wish to do someone elses homework, but rather posted the method I took, and I completely neglected the special case that fruiture mentions above. But I don't see a problem with using split(s). Anyhow, on to the code (granted no guarantees that it will work for all cases, I would use URI):
use strict;
use warnings;
while ( my $url = <DATA> )
{
chomp($url);
my $dup_url = $url;
if ( length($url) > 49)
{
$url =~ s!(?:
(^https?://[^/]+/).*/(.*)\?.*
)
|
(?:
(^https?://[^/]+/).*/(.*)
)
!
($1||$3) . '(...)/'. ($2||$4)
!ex;
my $http = (split /\/\//,$dup_url)[0];
my ($url_start, $url_end) = (split /\// ,(split /\?/,$dup_url)[0])
+[2,-1];
$dup_url = "$http//$url_start/(...)/$url_end";
}
print "REGEX: $url\n";
print "SPLIT: $dup_url\n\n";
}
__DATA__
http://some-shop.com/dir1/dir2/buystuff.cgi?x=1&y=2&z=3
http://somewhere/with/a/vastly/deep/structure/virus.exe
http://host.com/some/uri/whatever?some/query/stringthatis/here
https://some-shop.com/dir1/dir2/buystuff.cgi?x=1&y=2&z=3
https://somewhere/with/a/vastly/deep/structure/virus.exe
https://host.com/some/uri/whatever?some/query/stringthatis/here
| [reply] [d/l] |
|
|
I should said that I can't post code because I have to split.
| [reply] [d/l] |