in reply to Regex to Truncate URLs Nicely

I'm just about to brave the rush hour, so can't post code, but I would suggest using split on "/" instead of a regex.

The first few items in the split list will make up the front of the URI, and the last one can be split again on "?" to knock off the query parameters.

Replies are listed 'Best First'.
Re: use split?
by fruiture (Curate) on Nov 01, 2002 at 11:28 UTC

    Imho split() is NOT a good idea:

    http://host.com/some/uri/whatever?some/query/string
    --
    http://fruiture.de
      The problem with either method is that there are special cases which one might miss unless they understand exactly what a URL might look like (or for that case any data you have to parse through).

      Personally, I would use a module if someone has already taken the time to do the leg work of what specifications an URL has to meet.

      When I initially coded up a regex for this, and then didn't post it because I don't wish to do someone elses homework, but rather posted the method I took, and I completely neglected the special case that fruiture mentions above. But I don't see a problem with using split(s). Anyhow, on to the code (granted no guarantees that it will work for all cases, I would use URI):

      use strict; use warnings; while ( my $url = <DATA> ) { chomp($url); my $dup_url = $url; if ( length($url) > 49) { $url =~ s!(?: (^https?://[^/]+/).*/(.*)\?.* ) | (?: (^https?://[^/]+/).*/(.*) ) ! ($1||$3) . '(...)/'. ($2||$4) !ex; my $http = (split /\/\//,$dup_url)[0]; my ($url_start, $url_end) = (split /\// ,(split /\?/,$dup_url)[0]) +[2,-1]; $dup_url = "$http//$url_start/(...)/$url_end"; } print "REGEX: $url\n"; print "SPLIT: $dup_url\n\n"; } __DATA__ http://some-shop.com/dir1/dir2/buystuff.cgi?x=1&y=2&z=3 http://somewhere/with/a/vastly/deep/structure/virus.exe http://host.com/some/uri/whatever?some/query/stringthatis/here https://some-shop.com/dir1/dir2/buystuff.cgi?x=1&y=2&z=3 https://somewhere/with/a/vastly/deep/structure/virus.exe https://host.com/some/uri/whatever?some/query/stringthatis/here
Missed pun opportunity!
by cebrown (Pilgrim) on Nov 01, 2002 at 00:21 UTC
    I should said that I can't post code because I have to split.