in reply to Perl RegEx (url explode)

Hi,
thanks. Thats one of the ways I tried before. It allows both types. With and without Port but produces the following output:

http www example.de:9944 //this one should be "example.de" example de:9944 //this one should be only "de" ## PORT is empty ##

Replies are listed 'Best First'.
Re^2: Perl RegEx (url explode)
by U_nix$_@ (Initiate) on Nov 01, 2012 at 23:20 UTC

    (.*) Seems to ignore whats coming after it if ":" is optional.
    And the port becomes a part of this:

    ((.*)(?:\.)(.*))

    But how to fix it? A fixed set of commonly used TopLevelDomains is not felxible enough.

      Try to match a character class that does not contain ':' (i.e. [^:]):

      use strict; use warnings; for my $uri( qw(https://www.example.de http://www.example.de https://example.de http://example.de www.example.de example.de:123 http://www.example.de:445/can?this=happen&too=1#lalala http://www.example.de/can?this=happen&too=1#foo http://www.example.de:445 ) ) { print "in ($uri):\n"; my (@spl) = $uri =~ m|(http(?:s?))? (?:(?:://)? (w{0,3})\.{0,1})? ((.*)(?:\.)([^:/]*)) # match if it is not a ":" (?::(\d{0,10}))? |x; print 'out: ', join(', ', map { defined $_ ? $_ : '-' } @spl), "\n\ +n"; } __DATA__ in (https://www.example.de): out: https, www, example.de, example, de, - in (http://www.example.de): out: http, www, example.de, example, de, - in (https://example.de): out: https, , example.de, example, de, - in (http://example.de): out: http, , example.de, example, de, - in (www.example.de): out: -, www, example.de, example, de, - in (example.de:123): out: -, , example.de, example, de, 123 in (http://www.example.de:445/can?this=happen&too=1#lalala): out: http, www, example.de, example, de, 445 in (http://www.example.de/can?this=happen&too=1#foo): out: http, www, example.de, example, de, - in (http://www.example.de:445): out: http, www, example.de, example, de, 445
      Update: Added '/' to character class and example '#foo'

      Ok. The spirit reached me. This fixed it:

      ((.*)(?:\.)([a-zA-Z]*))(?::(\d{0,10}))?

      Edit:
      @Perlbotics,
      Your "match if not" version is the cleaner one. Merci.