slaclnx has asked for the wisdom of the Perl Monks concerning the following question:

PerlMonkers,

I apologize I am a novice at regexp and am struggling trying to figure this out. I'm trying to parse a string that is either in one of the two following forms:
1) hostname.company.com:80//directory/...
or
2) hostname.company.com:80//
I am looking to get the hostname and the directory if it exists with a single regexp if possible (for minimalizing code). I currently have the following for the first scenario:

/\:\/\/(.*?)\:\d{2,4}\/\/(.*?)\//
And this works for the second:
/\:\/\/(.*?)\:\d{2,4}\/\//
I was wondering if there was a way construct a single regexp that would work for both? Any suggestions?

Thanks
slaclnx

Replies are listed 'Best First'.
Re: Can you do an OR with a REGEXP
by ikegami (Patriarch) on May 21, 2009 at 19:45 UTC

    URI

    use URI qw( ); for ( 'http://hostname.company.com:80//directory/...', 'http://hostname.company.com:80//', ) { my $url = URI->new($_); print $url->host, ":", $url->port, "\n"; print $url->path, "\n"; print "\n"; }
    hostname.company.com:80 //directory/... hostname.company.com:80 //
Re: Can you do an OR with a REGEXP
by zwon (Abbot) on May 21, 2009 at 19:05 UTC

    URL parsing is not so simple. You can write regexp for your case like this:

    use strict; use warnings; my $str1 = 'http://hostname.company.com:80//directory/'; my $str2 = 'http://hostname.company.com:80//'; for ( $str1, $str2 ) { my ( $host, $dir ) = m{https?://([^:]+):\d+//(:?([^/]+)/)?}; print "Host: $host, Dir: ", ( defined($dir) ? $dir : 'undefined' ) +, "\n"; }
    But it's better to use module URI

    P.S.: btw, why there are two slashes?

      I'm trying to parse a Juniper access log to gauge application usage. The two forward slashes are due to the Juniper logging. Kind of weird since that is the only part of the URL that it doubles the slashes. thanks slaclnx
Re: Can you do an OR with a REGEXP
by kennethk (Abbot) on May 21, 2009 at 19:16 UTC
    To answer the specific question asked, there are two ways to use or concepts in Perl regular expressions. There is an alternation operator "|" or, as would be more applicable in your case, there is a one-or-none qualtifier "?". Both are discussed in Regular Expressions. You may also consider a read through of perlretut to familiarize yourself with details in regular expressions. Of course for common regular expressions like the one you are looking for, there's usually an implementation you case borrow from CPAN, such as zwon's suggestion above or Regexp::Common.