in reply to how to do match www.jkghfdjbh.org from $li?

Have you read perlretut yet? You've got to get through some of that stuff if you want to move forward.

. is a special metacharacter inside of Perl regular expressions. It means to match anything except for newline.

Character classes match only a single character unless you add a quantifier.

Alternation is constrained to the entire regular expression, or the first enclosing ( ... ) or (?: ... ) construct.

Case insensitivity applies to character classes too.

Combine those issues, and what you have is:

m/ www # match literal 'www' . # match any single character except \n. [a-z] # match any single character between a and z. | # OR [A-Z] # match any single character between A and Z. . # match any single character except \n. [a-z] # match any single character between a and z. /ix # /i makes everything case-insensitive, so there's # no difference between [A-Z] and [a-z].

If you want to accomplish this without learning regular expressions, install the URI::Find distribution, and use its URI::Find::Schemeless module.


Dave

Replies are listed 'Best First'.
Re^2: how to do match www.jkghfdjbh.org from $li?
by AnomalousMonk (Archbishop) on Jun 11, 2013 at 11:18 UTC

    Further to davido's point about alternation:
    Because it's a point that often escapes people (it's escaped me often enough), I want to emphasize that the effective low precedence of the  | (alternation) regex operator means that the  [A-Z].[a-z] portion of the OPed regex matches independently of the rest of that particular regex. E.g., after fixing the  // delimiter confusion, but leaving the  . (dot) matching as it was:

    >perl -wMstrict -le "my $li = 'foo'; ;; print qq{matched '$&' in '$li'} if $li =~ m{http://www.[a-z]|[A-Z].[a-z]}i; " matched 'foo' in 'foo'

    NB: Don't get into the habit of using the  $& $` $' special matching variables in your regexes. See the paragraph in perlre that begins "WARNING: Once Perl sees that you need one of $&, $`, or $' anywhere in the program ..." for a discussion of the cost of using them, and also the following paragraph for a workaround available in Perl version 5.10+. Also see the discussion of these variables in perlretut for workarounds using substr that can be used pre-5.10.

      Lol. I totally missed the /http//:...../ delimiter bug. It starts looking like an intentional attempt to get it wrong just so we will jump around like our hair is on fire trying to fix it. ...because there really is nothing that is right within it. An honest attempt to get it right would contain at least one portion of the RE that isn't a bug. ;)


      Dave

        You missed it, because it was not in the question initially. I really hate it, when the question is altered after receiving the first answers. Usually makes the answers looking silly as they point to issues that are not present in the question.