in reply to Re: handling erronous input
in thread handling erronous input

Hi and thanks FunkyMonk for the reply.. i do get a little lost tho here as regards where i am opening my file and feeding it into @data , can you expound on that for me please? (sorry for being the noob)

fwiw, based on your well explained pattern matching sequences above, i have also created a possible revised unless statement

open(DATAFILE, "$input") || die("Can't open $input:!\n"); while (<DATAFILE>) { unless (m{^(\d\.d{5})\s,(\d\d:\d\d:\d\d)\s*}) { next; } chomp $_; ($quote,$time) = split(",", $_); chop($quote); #remove a white space ($hour,$minute,$second) = split(":",$time);) # more processing
How does that look? although i do like the way you have done things.. because of an internet outage here the last 4 hours , i was unable to do any testing of new code and had to bring up my old buggy code live @ 5pm E.S.T .. id really like to able to just drop in a new unless statement into the existing code , if thats at all possible?

The script eventually updates a mysql database and and creates webpage.. so its not straightforward testing the code out of a live situation so i want to keep revisions to a minimum.

p.s i realise that i may be dismissing some of the conventions of working with floating point numbers which may be a little unsettling to some, but for this project i am sure that the 'shortcuts' i am taking are safe. I have my code working fine in a live environment for 2 weeks now. The only issue i have is this bug when dealing with unexpected input data formats in my input files.

p.p.s sorry for being so verbose here.

conal.

Replies are listed 'Best First'.
Re^3: handling erronous input
by FunkyMonk (Bishop) on Apr 06, 2008 at 22:57 UTC
    You've missed part of the regexp out (the bit that captures comments) and missed a backslash out (from \d{5}). It looks like you don't know that, in a regexp, parentheses capture their matches into $1, $2, $3 etc. Again, see perlretut and perlre for the details.

    Your code is similar to mine. You use

    while ( ... ) { unless ( some-condition ) { next } some-code }

    while I prefer the equivalent

    while ( ... ) { if ( some-condition ) { some-code } }

    it's just that (IMHO) yours is harder to read (and longer, too)

    That said, you can use my code with a filehandle like so (I've rearranged it a bit to use unless and made the regexp a more lenient towards spaces)...

    while ( <DATAFILE> ) { chomp; unless ( m{^ (\d\.\d{5}) \s*,\s* (\d\d:\d\d:\d\d) \s* (.*) $ }x ) +{ next } my ( $quote, $time, $comment ) = ( $1, $2, $3 ); # captures my ( $hours, $minutes, $seconds ) = split /:/, $time; #do something with $quote, $hours, $minutes, $seconds & $comment }

      I prefer:

      while ( ... ) { next if some-condition; some-code }

      which not only saves a couple of (trivial I admit) lines of code, it reduces clutter and avoids an extra level of nesting for some-code.


      Perl is environmentally friendly - it saves trees
      gotcha.. i understand completely now.

      thanks again, and its works fine, i just wanted to be certain before i messed anything up.. fwiw here is a link to my project --> http://fxr.freehostia.com/pam/pam_alpha_5.php

      and sorry for my wanton abuse of floating point numbers. ;p

      conal.
Re^3: handling erronous input
by ww (Archbishop) on Apr 07, 2008 at 02:55 UTC
    In addition to the problems with your first regex,
    ($hour,$minute,$second) = split(":",$time);) should be
    ($hour,$minute,$second) = split /:/,$time;

    The pattern in split is a regex and needs slashes (or other unambiguous matched punctuation), not quotes. Note also that the last closing paren in your split is "one too many" (and thus, "wrong) and all the parens on the RHS are unnecessary.

    Subject to your taste, note that your extraction to $quote and $timecould be written

    next unless ( $data =~ /^(\d\.\d{5})\s,(\d\d:\d\d:\d\d).*/ ); $quote = $1; $time=$2;

    Update: s/not/note/ in the last narrative paragraph.

      Or

      next unless ( $data =~ /^(\d\.\d{5})\s,(\d{2}:\d{2}:\d{2}).*/ ); $quote = $1; $time=$2;