perl_geoff has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks, I am pressed for time and I need help splitting a URL string... Here's what I have so far, but my regex isn't working:
use strict; my $line = "http://127.0.0.1/home_dir"; if ($line =~ /(\/.)(\/.)(\/.)/) { print "$1\n"; print "$2\n"; print "$3\n"; } else { print "Bad RegEx\n"; } exit;
What I need to do is get the directory home_dir in its own scalar. Any help is much appreciated!

Replies are listed 'Best First'.
Re: need to split URL string
by sauoq (Abbot) on Feb 08, 2007 at 20:46 UTC
    I am pressed for time and I need help splitting a URL string...

    Then use someone else's work... URI

    -sauoq
    "My two cents aren't worth a dime.";
      nice...thanks :)
Re: need to split URL string
by ww (Archbishop) on Feb 08, 2007 at 23:46 UTC

    Trizor is correct about the dot.

    For some perhaps unneeded expansion, here's one possible regex, commented, in extended notation so you can see what you're really doing (and based on what I've inferred from your use of three captures, designed to collect the protocol, IP, and server_directory):

    use strict; my $line = "http://127.0.0.1/home_dir"; if ($line =~ / # begin regex ( # begin Capture #1 http: # protocol ) # end Capture #1 \/+ # (discard) one (or more) forward slashes; impre +cise (127\.0\.0\.1) # Capture #2, including the dots IP \/ # (discard) one forward slash (home_dir) # Capture #3 (literal and simplistic) /x) # end regex, use extended syntax, end the if pr +edicate { print "$1\n"; print "$2\n"; print "$3\n"; } else { print "Bad RegEx\n"; } exit;

    prints

    http: 127.0.0.1 home_dir

    Note that this is NOT a general solution, since I've used literals, liberally, from your example, and -- in the interest of simplicity -- without addressing such constructs as
    {1,2}
    a quantifier specifying either one or two of whatever it modfies <Post prandial update begins> or the use of alternate delimiters, both of which are illustrated here:

    use strict; my $line = "http://127.0.0.1/home_dir"; if ($line =~ m% # begin regex - alternate delimiter ( # begin Capture #1 http: # protocol ) # end Capture #1 /{2,2} # (discard) exactly two forward slashes # note: no need to escape the "/" now, # can also be written as {2} in some cases ([\d.]{1,14}) # Capture #2, including literal dots IP / # (discard) one forward slash (\w{1,9}) # Capture #3, one-or-more (chars in range a-z O +R underline) %x) # end regex, use extended syntax, end the if pr +edicate { print "$1\n"; print "$2\n"; print "$3\n"; } else { print "Bad RegEx\n"; } exit;

    So why did I even bother to mention that? Well, as you learn about regexen, you can make your tools more precise and more useful... and have a lot of fun doing so.

    Jeffrey E. F. Friedl's "Mastering Regular Expressions" (O'Reilly) is the canon on the topic. It's a big mouthful, but one worth chewing through a bite at a time

Re: need to split URL string
by Trizor (Pilgrim) on Feb 08, 2007 at 22:27 UTC
    As a general note on your regex style, . matches any one character, you need to use quantity qualifiers to match more than that. In this case you probably would want to use the lazy versions that will stop as soon as they match. More info in perlre about regexes.
Re: need to split URL string
by dsheroh (Monsignor) on Feb 09, 2007 at 00:31 UTC
    If your only interest is in the home_dir, why bother capturing any of the text preceding it? You can use /([^/]*)$/ to capture all text following the last / character if that's all you're looking for. (Note: Regex not actually tested and may contain typos.)
      Close, but as you appear to have suspected, there is a typo, which can be dealt with this way:
      m%([^/]*)$%)

      In your original, the forward slash inside the stock delimiters terminates the regex you offered, meaning that has an unmatched "["