in reply to Re: Parsing Regex
in thread Parsing Regex

($date, $address) = $array[$line+2] =~ /.*?(\d{2}\/\d{2}\/\d{4})(.*)/;
Removing the last '?', this almost works, but when $1 is empty so is $2. However, if I include the '?' the date is included in $2.

Replies are listed 'Best First'.
Re^3: Parsing Regex
by muba (Priest) on Sep 23, 2009 at 01:12 UTC

    Seems such a trivial thing, don't you agree?

    Let's give it another shot.

    while (<DATA>) { # lol, comments in __DATA__ :) next if m/^#/; #m!! to allow for better-readable slashes inside the regex #/x modifier to make the regex even better readable ($date, $address) = $_ =~ m! .*? (\d+ / \d+ / \d+)? \s* (.+) !x; print "date:<$date>\naddress:<$address>\n\n"; } __DATA__ # a line without a date A very good looking address # a line with a date 15/5/85 That's my actual birth day!

    Output:

    date:<> address:<A very good looking address> date:<15/5/85> address:<That's my actual birth day!>
      So it didn't work, here's what's interesting:
      Code:
      $line =~ m/(.*?)(\d+\/\d+\/\d+)?\s*(.*)/; print "Input : $line"; print "Output \$1: -$1-\n"; print "Output \$2: -$2-\n"; print "Output \$3: -$3-\n\n";
      Results:
      Input : 301 S. MAPLE STREET Output $1: -- Output $2: -- -utput $3: -301 S. MAPLE STREET Input : 09/09/2009 301 S. MAPLE STREET Output $1: -- Output $2: -- -utput $3: -09/09/2009 301 S. MAPLE STREET
      I included the "neat" flag (x), but decided to take it away. For all purposes, the above should be the same. Note: I put the first "ungreedy capture anything" (.*?) in $1 for debugging purposes. The results are interesting.

      One interesting thing is the termination '-' that is being outputted to the front of the line, which makes it look like more is going on here than meets the eye.

      What I really need is a lookbehind. Note: even if I get rid of the first .*, there are still problems.

      The same thing is throwing it off (the date capture). If I remove or change the '?' to a '+', then it works for the second input, but not the first input. The first input reuses $1 and $2 from a previous regex (not shown) $3 empty; hence, it doesn't match at all.

      Example:
      Input : 301 S. MAPLE STREET Output $1: -000001- -utput $2: - JOHN SMITH, III Output $3: -- Input : 09/09/2009 301 S. MAPLE STREET Output $1: - - Output $2: -09/09/2009- -utput $3: -301 S. MAPLE STREET
        Please delete this reply -- I was wrong -- it put me back to square 1
      I'll take a look at this, but I'm skeptical since the regex doesn't seem to have changed that much. The \s* might make a difference though.