handling erronous input

Conal has asked for the wisdom of the Perl Monks concerning the following question:

Please can someone help me flush out some bugs in my script. I have an input file which can look like this..

//input.txt

1.57163 ,17:29:57 Simple Dealin 
1.57163 ,17:29:57 
1.57163 ,17:29:57 
1.57163 ,17:29:57 
1.57163 ,17:29:57 
1.57163 ,17:29:57 
1.57163 ,17:29:57 
1.571 ,17: 
1.57172 ,17:30:08 
1.57176 ,17:30:10
[download]

I only want to use data like '1.57172 ,17:30:08 ' for my computation, e.g.. i want to disregard data that looks like ' 1.571 ,17: ' I would also like to use ' 1.57163 ,17:29:57 Simple Dealin ', if my code was capable of ignoring the data after the time.

The code i have come up with isnt good enough, which is below..


while (<DATAFILE>) {
      unless (m{^(.*?)\s*,([\d:]+)}) 
    {
       next;
    }
    
    chomp $_;
         
    ($quote,$time) = split(",", $_);
    // do my computations
    chop($quote);chop($quote);
[download]

What i need is for my code to only accept input of 1 digit before the decimal place and 5 after , a space, a comma.. then an 8 character time using : as a seperator that will ignore any data on the same line after the seconds.

can anyone help me flush out this bug in my script please?

conal.

Comment on handling erronous input Select or Download Code

Replies are listed 'Best First'.
Re: handling erronous input by FunkyMonk (Bishop) on Apr 06, 2008 at 15:39 UTC
What i need is for my code to only accept input of 1 digit before the decimal place and 5 after , a space, a comma.. then an 8 character time using : as a seperator that will ignore any data on the same line after the seconds. You seem to know what you want, so it's just a case of following your spec... my @data = split /\n/, <<EOS; 1.57163 ,17:29:57 Simple Dealin 1.57163 ,17:29:57 1.57163 ,17:29:57 1.57163 ,17:29:57 1.57163 ,17:29:57 1.57163 ,17:29:57 1.57163 ,17:29:57 1.571 ,17: 1.57172 ,17:30:08 1.57176 ,17:30:10 EOS for ( @data ) { if ( my ( $quote, $time, $comment ) = m{ ^ # start of string (\d\.\d{5}) # a digit, dot and 5 more digits \s, # a space and a comma (\d\d:\d\d:\d\d) # an 8 character time \s* # some spaces (.*) # everything else's a commment $ # end of string }x ) { print "$quote / $time", $comment ne '' ? " / $comment" : '', "\n"; } } [download] Output: `1.57163 / 17:29:57 / Simple Dealin 1.57163 / 17:29:57 1.57163 / 17:29:57 1.57163 / 17:29:57 1.57163 / 17:29:57 1.57163 / 17:29:57 1.57163 / 17:29:57 1.57172 / 17:30:08 1.57176 / 17:30:10` [download] Update: See perlre and perlretut for the details	[reply] [d/l] [select]
Re^2: handling erronous input by Conal (Beadle) on Apr 06, 2008 at 22:23 UTC
Hi and thanks FunkyMonk for the reply.. i do get a little lost tho here as regards where i am opening my file and feeding it into @data , can you expound on that for me please? (sorry for being the noob) fwiw, based on your well explained pattern matching sequences above, i have also created a possible revised unless statement `open(DATAFILE, "$input") \|\| die("Can't open $input:!\n"); while (<DATAFILE>) { unless (m{^(\d\.d{5})\s,(\d\d:\d\d:\d\d)\s*}) { next; } chomp $_; ($quote,$time) = split(",", $_); chop($quote); #remove a white space ($hour,$minute,$second) = split(":",$time);) # more processing` [download] How does that look? although i do like the way you have done things.. because of an internet outage here the last 4 hours , i was unable to do any testing of new code and had to bring up my old buggy code live @ 5pm E.S.T .. id really like to able to just drop in a new unless statement into the existing code , if thats at all possible? The script eventually updates a mysql database and and creates webpage.. so its not straightforward testing the code out of a live situation so i want to keep revisions to a minimum. p.s i realise that i may be dismissing some of the conventions of working with floating point numbers which may be a little unsettling to some, but for this project i am sure that the 'shortcuts' i am taking are safe. I have my code working fine in a live environment for 2 weeks now. The only issue i have is this bug when dealing with unexpected input data formats in my input files. p.p.s sorry for being so verbose here. conal.	[reply] [d/l]
Re^3: handling erronous input by FunkyMonk (Bishop) on Apr 06, 2008 at 22:57 UTC
You've missed part of the regexp out (the bit that captures comments) and missed a backslash out (from \d{5}). It looks like you don't know that, in a regexp, parentheses capture their matches into $1, $2, $3 etc. Again, see perlretut and perlre for the details. Your code is similar to mine. You use `while ( ... ) { unless ( some-condition ) { next } some-code }` [download] while I prefer the equivalent `while ( ... ) { if ( some-condition ) { some-code } }` [download] it's just that (IMHO) yours is harder to read (and longer, too) That said, you can use my code with a filehandle like so (I've rearranged it a bit to use unless and made the regexp a more lenient towards spaces)... `while ( <DATAFILE> ) { chomp; unless ( m{^ (\d\.\d{5}) \s,\s (\d\d:\d\d:\d\d) \s* (.*) $ }x ) +{ next } my ( $quote, $time, $comment ) = ( $1, $2, $3 ); # captures my ( $hours, $minutes, $seconds ) = split /:/, $time; #do something with $quote, $hours, $minutes, $seconds & $comment }` [download]	[reply] [d/l] [select]
Re^4: handling erronous input by GrandFather (Saint) on Apr 06, 2008 at 23:35 UTC
Re^4: handling erronous input by Conal (Beadle) on Apr 06, 2008 at 23:50 UTC
Re^3: handling erronous input by ww (Archbishop) on Apr 07, 2008 at 02:55 UTC
In addition to the problems with your first regex, `($hour,$minute,$second) = split(":",$time);)` should be `($hour,$minute,$second) = split /:/,$time;` The pattern in `split` is a regex and needs slashes (or other unambiguous matched punctuation), not quotes. Note also that the last closing paren in your split is "one too many" (and thus, "wrong) and all the parens on the RHS are unnecessary. Subject to your taste, note that your extraction to `$quote` and `$time`could be written `next unless ( $data =~ /^(\d\.\d{5})\s,(\d\d:\d\d:\d\d)./ ); $quote = $1; $time=$2;` [download] Update:* s/not/note/ in the last narrative paragraph.	[reply] [d/l] [select]
Re^4: handling erronous input by apl (Monsignor) on Apr 07, 2008 at 12:06 UTC
Re: handling erronous input by swampyankee (Parson) on Apr 06, 2008 at 15:37 UTC
What I would suggest is breaking your logic (and code) up into chunks: First split the record, more or less as you're doing (I'd split on `/\s,\s/`), but that's a minor quibble). Second, process any extraneous text ("Simple Dealin"), trailing whitespace, newlines, etc. Make sure that the input values (why do all the entries in the first column look like π/2?) are in the expected range and form. (it looks like the value of the input must be greater than or equal to zero and less than ten) Validate the time in whatever way you require I'm sure there's a regex that would do everything in one swell foop, but my regex mojo hasn't fully wakened yet. Information about American English usage here and here. Floating point issues? Please read this before posting. — emc	[reply]