in reply to Splitting an url in its components

Usually it's best to use a module like Regexp::Common::URI or URI or Rose::URI to extract the path information.

If you are only interested in anything after the last slash, a regex should work fine:

if ($uri =~ m{([^/]+)\z}{ print $1, $/; }

Stripping the extension is a bit harder, because first you have to define what an extension is. If you just want to split off everything from the last period to the end, use a regex like this:

$filename =~ s/\.[^.]+\z//;

That will give you pkg-5.6.tar in the first example, which is technically correct, because you have a .tar file inside a .gz file. If you don't like that outcome, specify how the recognition of the extension should work.

Replies are listed 'Best First'.
Re^2: Splitting an url in its components
by baurel (Sexton) on Jul 17, 2008 at 13:34 UTC
    hi
    thanks alot!

    Could you tell me the solution

    filename input: pkg-5.6.tar.gz

    to

    filename output: pkg-5.6

    Regular expressions are really hard for me ... I manage to read your solutions (more or less) but I'm not able to write my own yet :-(
      Well, you can special-case it, but then it'll only work for .tar.gz:
      s{\.(?:tar\.gz|[^.]+)\z}{};

      The problem is that, in general, you can't know from a file name which part of it is "extension" and which part is not, unless you either have a clear-cut definition of what "extension" means (I don't know any that satisfies your requirement), or you have a list of all possible extensions.

        this is great and exactly what I was looking for :-)

        for my personal learning purposes:

        I don't fully understand the expression "(?:tar\.gz|^.+)"

        - what does the part '?:tar\.gz' mean?

        thanks so far for your help
        ben
      Both my solutions (below) treat every dot-followed-by-things-that-are-word-chars-but-arent-just-numbers as extensions, so they'll get both your cases pkg-5.6.tar.gz and pkg-5.6-win32.zip right
      []s, HTH, Massa