in reply to regex to match words and numbers

use warnings; use strict; my @chunks = <DATA>; for (@chunks) { print unless /^(?:[*]|[a-zA-Z]|\d+\.)/; } __DATA__ 1. For further assistance email us at 121@azing.com', * You can also highlight any matter related to our Officer', Page 2, 12345
prints:
12345

Courtesy of YAPE::Regex::Exlpain:

The regular expression: (?-imsx:^(?:[*]|[a-zA-Z]|\d+\.)) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ^ the beginning of the string ---------------------------------------------------------------------- (?: group, but do not capture: ---------------------------------------------------------------------- [*] any character of: '*' ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- [a-zA-Z] any character of: 'a' to 'z', 'A' to 'Z' ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- \d+ digits (0-9) (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \. '.' ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

Also, here is an explanation of your regex, which might help shed some light on why it doesn't work as you expect:

The regular expression: (?-imsx:[^\d+\.?\*?]) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- [^\d+\.?\*?] any character except: digits (0-9), '+', '\.', '?', '\*', '?' ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

Update due to Animator's observation.

Replies are listed 'Best First'.
Re^2: regex to match words and numbers
by Animator (Hermit) on Jun 08, 2011 at 14:06 UTC

    Note that the regex (/^[*]|[a-zA-Z]|\d+\./)does not match what the OP wants.

    The regex should contain a group or the '^' needs to be repeated.

    That is: /^(?:[*]|[a-zA-Z]|\d+\.)/ OR /^[*]|^[a-zA-Z]|^\d+\./

    (The regex as posted will return true when it contains a letter somewhere in the string)

    Update: an example was requested but that request was later removed.

    Anyway: an example as requesetd:

    #!/usr/bin/perl use warnings; use strict; my @chunks = <DATA>; for (@chunks) { print unless /^[*]|[a-zA-Z]|\d+\./; } __DATA__ a @ b @ d e

    Output:

    @
    

    The lines 'a', 'b', 'e' are rejected. (ok)
    The line '@' is not rejected. (ok)
    The line '@ d' is rejected. (not ok)

    As far as I can tell '@ d' does not start with a '*'. It also does not start with a word and it also does not start with a number followed by a '.'

      I agree. I originally had the group-no-capture, then foolishly removed them.