in reply to Parsing Regex

Bear with me, it's 2:47 AM localtime, so I may be misinterpreting your question or the case, but how about:

($date, $address) = $array[$line+2] =~ /.*?(\d{2}\/\d{2}\/\d{4})?(.*)/ +;

Replies are listed 'Best First'.
Re^2: Parsing Regex
by deMize (Monk) on Sep 23, 2009 at 01:02 UTC
    ($date, $address) = $array[$line+2] =~ /.*?(\d{2}\/\d{2}\/\d{4})(.*)/;
    Removing the last '?', this almost works, but when $1 is empty so is $2. However, if I include the '?' the date is included in $2.

      Seems such a trivial thing, don't you agree?

      Let's give it another shot.

      while (<DATA>) { # lol, comments in __DATA__ :) next if m/^#/; #m!! to allow for better-readable slashes inside the regex #/x modifier to make the regex even better readable ($date, $address) = $_ =~ m! .*? (\d+ / \d+ / \d+)? \s* (.+) !x; print "date:<$date>\naddress:<$address>\n\n"; } __DATA__ # a line without a date A very good looking address # a line with a date 15/5/85 That's my actual birth day!

      Output:

      date:<> address:<A very good looking address> date:<15/5/85> address:<That's my actual birth day!>
        So it didn't work, here's what's interesting:
        Code:
        $line =~ m/(.*?)(\d+\/\d+\/\d+)?\s*(.*)/; print "Input : $line"; print "Output \$1: -$1-\n"; print "Output \$2: -$2-\n"; print "Output \$3: -$3-\n\n";
        Results:
        Input : 301 S. MAPLE STREET Output $1: -- Output $2: -- -utput $3: -301 S. MAPLE STREET Input : 09/09/2009 301 S. MAPLE STREET Output $1: -- Output $2: -- -utput $3: -09/09/2009 301 S. MAPLE STREET
        I included the "neat" flag (x), but decided to take it away. For all purposes, the above should be the same. Note: I put the first "ungreedy capture anything" (.*?) in $1 for debugging purposes. The results are interesting.

        One interesting thing is the termination '-' that is being outputted to the front of the line, which makes it look like more is going on here than meets the eye.

        What I really need is a lookbehind. Note: even if I get rid of the first .*, there are still problems.

        The same thing is throwing it off (the date capture). If I remove or change the '?' to a '+', then it works for the second input, but not the first input. The first input reuses $1 and $2 from a previous regex (not shown) $3 empty; hence, it doesn't match at all.

        Example:
        Input : 301 S. MAPLE STREET Output $1: -000001- -utput $2: - JOHN SMITH, III Output $3: -- Input : 09/09/2009 301 S. MAPLE STREET Output $1: - - Output $2: -09/09/2009- -utput $3: -301 S. MAPLE STREET
        I'll take a look at this, but I'm skeptical since the regex doesn't seem to have changed that much. The \s* might make a difference though.