in reply to Re: use split?
in thread Regex to Truncate URLs Nicely

The problem with either method is that there are special cases which one might miss unless they understand exactly what a URL might look like (or for that case any data you have to parse through).

Personally, I would use a module if someone has already taken the time to do the leg work of what specifications an URL has to meet.

When I initially coded up a regex for this, and then didn't post it because I don't wish to do someone elses homework, but rather posted the method I took, and I completely neglected the special case that fruiture mentions above. But I don't see a problem with using split(s). Anyhow, on to the code (granted no guarantees that it will work for all cases, I would use URI):

use strict; use warnings; while ( my $url = <DATA> ) { chomp($url); my $dup_url = $url; if ( length($url) > 49) { $url =~ s!(?: (^https?://[^/]+/).*/(.*)\?.* ) | (?: (^https?://[^/]+/).*/(.*) ) ! ($1||$3) . '(...)/'. ($2||$4) !ex; my $http = (split /\/\//,$dup_url)[0]; my ($url_start, $url_end) = (split /\// ,(split /\?/,$dup_url)[0]) +[2,-1]; $dup_url = "$http//$url_start/(...)/$url_end"; } print "REGEX: $url\n"; print "SPLIT: $dup_url\n\n"; } __DATA__ http://some-shop.com/dir1/dir2/buystuff.cgi?x=1&y=2&z=3 http://somewhere/with/a/vastly/deep/structure/virus.exe http://host.com/some/uri/whatever?some/query/stringthatis/here https://some-shop.com/dir1/dir2/buystuff.cgi?x=1&y=2&z=3 https://somewhere/with/a/vastly/deep/structure/virus.exe https://host.com/some/uri/whatever?some/query/stringthatis/here