mkurtis has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to take a url, say http://www.foo.com/foo/index.htm and remove the index.htm portion. This would have to be also able to remove index.htm from http://www.foo.com/foo/foo2/index.htm, in other words, truncate to the last /. Any ideas? Thanks

Replies are listed 'Best First'.
Re: truncating urls
by Abigail-II (Bishop) on Feb 19, 2004 at 23:11 UTC
Re: truncating urls
by Berik (Sexton) on Feb 19, 2004 at 23:31 UTC
    Problem here is that you can't be sure that the new link would still work. This depends on server configuration, and the files present in the directory.

    For example, if you're substituting index.htm. And there is a file called default.html in thesame dir, and the server is configured to first look for a file called default.html. You'll get the default.html, not what you'd expected.

    Anyway, your looking for a substitution like this:
    $url =~ s-(?<=/)index\.htm(?=(\?|$))--;

    This checks wheter index.htm is preceded by a '/' and 'end of string' or a '?' must follow it.

    You could also have a look at URI, this module uses the official URL matching rules.

    ---
    Berik
Re: truncating urls
by borisz (Canon) on Feb 19, 2004 at 23:12 UTC
    use a regex.
    # remove anything in a line upto the last occurance of / s!^.*/!!;
    use abigail's solution, I read it the other way around.
    Boris