in reply to Re: Splitting an url in its components
in thread Splitting an url in its components

hi
thanks alot!

Could you tell me the solution

filename input: pkg-5.6.tar.gz

to

filename output: pkg-5.6

Regular expressions are really hard for me ... I manage to read your solutions (more or less) but I'm not able to write my own yet :-(
  • Comment on Re^2: Splitting an url in its components

Replies are listed 'Best First'.
Re^3: Splitting an url in its components
by moritz (Cardinal) on Jul 17, 2008 at 15:15 UTC
    Well, you can special-case it, but then it'll only work for .tar.gz:
    s{\.(?:tar\.gz|[^.]+)\z}{};

    The problem is that, in general, you can't know from a file name which part of it is "extension" and which part is not, unless you either have a clear-cut definition of what "extension" means (I don't know any that satisfies your requirement), or you have a list of all possible extensions.

      this is great and exactly what I was looking for :-)

      for my personal learning purposes:

      I don't fully understand the expression "(?:tar\.gz|^.+)"

      - what does the part '?:tar\.gz' mean?

      thanks so far for your help
      ben
        There is no ?:tar\.gz part, because (?: ... ) has a distinct meaning (it's grouping without capturing), and tar\.gz just matches the string tar.gz.
        Use <c> code here </c> tags (Markup in the Monastery).

        Install YAPE::Regex::Explain, because

        use YAPE::Regex::Explain; print YAPE::Regex::Explain->new(qr~(?:tar\.gz|[^.]+)~)->explain; __END__ The regular expression: (?-imsx:(?:tar\.gz|[^.]+)) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- (?: group, but do not capture: ---------------------------------------------------------------------- tar 'tar' ---------------------------------------------------------------------- \. '.' ---------------------------------------------------------------------- gz 'gz' ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- [^.]+ any character except: '.' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
Re^3: Splitting an url in its components
by massa (Hermit) on Jul 17, 2008 at 22:08 UTC
    Both my solutions (below) treat every dot-followed-by-things-that-are-word-chars-but-arent-just-numbers as extensions, so they'll get both your cases pkg-5.6.tar.gz and pkg-5.6-win32.zip right
    []s, HTH, Massa