http://host.com/some/uri/whatever?some/query/string
--
http://fruiture.de | [reply] [d/l] |
The problem with either method is that there are special cases which one might miss unless they understand exactly what a URL might look like (or for that case any data you have to parse through). Personally, I would use a module if someone has already taken the time to do the leg work of what specifications an URL has to meet. When I initially coded up a regex for this, and then didn't post it because I don't wish to do someone elses homework, but rather posted the method I took, and I completely neglected the special case that fruiture mentions above. But I don't see a problem with using split(s). Anyhow, on to the code (granted no guarantees that it will work for all cases, I would use URI):
use strict;
use warnings;
while ( my $url = <DATA> )
{
chomp($url);
my $dup_url = $url;
if ( length($url) > 49)
{
$url =~ s!(?:
(^https?://[^/]+/).*/(.*)\?.*
)
|
(?:
(^https?://[^/]+/).*/(.*)
)
!
($1||$3) . '(...)/'. ($2||$4)
!ex;
my $http = (split /\/\//,$dup_url)[0];
my ($url_start, $url_end) = (split /\// ,(split /\?/,$dup_url)[0])
+[2,-1];
$dup_url = "$http//$url_start/(...)/$url_end";
}
print "REGEX: $url\n";
print "SPLIT: $dup_url\n\n";
}
__DATA__
http://some-shop.com/dir1/dir2/buystuff.cgi?x=1&y=2&z=3
http://somewhere/with/a/vastly/deep/structure/virus.exe
http://host.com/some/uri/whatever?some/query/stringthatis/here
https://some-shop.com/dir1/dir2/buystuff.cgi?x=1&y=2&z=3
https://somewhere/with/a/vastly/deep/structure/virus.exe
https://host.com/some/uri/whatever?some/query/stringthatis/here
| [reply] [d/l] |
I should said that I can't post code because I have to split. | [reply] [d/l] |