Conal has asked for the wisdom of the Perl Monks concerning the following question:

Please can someone help me flush out some bugs in my script. I have an input file which can look like this..

//input.txt 1.57163 ,17:29:57 Simple Dealin 1.57163 ,17:29:57 1.57163 ,17:29:57 1.57163 ,17:29:57 1.57163 ,17:29:57 1.57163 ,17:29:57 1.57163 ,17:29:57 1.571 ,17: 1.57172 ,17:30:08 1.57176 ,17:30:10
I only want to use data like '1.57172 ,17:30:08 ' for my computation, e.g.. i want to disregard data that looks like ' 1.571 ,17: ' I would also like to use ' 1.57163 ,17:29:57 Simple Dealin ', if my code was capable of ignoring the data after the time.

The code i have come up with isnt good enough, which is below..

while (<DATAFILE>) { unless (m{^(.*?)\s*,([\d:]+)}) { next; } chomp $_; ($quote,$time) = split(",", $_); // do my computations chop($quote);chop($quote);
What i need is for my code to only accept input of 1 digit before the decimal place and 5 after , a space, a comma.. then an 8 character time using : as a seperator that will ignore any data on the same line after the seconds.

can anyone help me flush out this bug in my script please?

conal.

Replies are listed 'Best First'.
Re: handling erronous input
by FunkyMonk (Bishop) on Apr 06, 2008 at 15:39 UTC
    What i need is for my code to only accept input of 1 digit before the decimal place and 5 after , a space, a comma.. then an 8 character time using : as a seperator that will ignore any data on the same line after the seconds.
    You seem to know what you want, so it's just a case of following your spec...
    my @data = split /\n/, <<EOS; 1.57163 ,17:29:57 Simple Dealin 1.57163 ,17:29:57 1.57163 ,17:29:57 1.57163 ,17:29:57 1.57163 ,17:29:57 1.57163 ,17:29:57 1.57163 ,17:29:57 1.571 ,17: 1.57172 ,17:30:08 1.57176 ,17:30:10 EOS for ( @data ) { if ( my ( $quote, $time, $comment ) = m{ ^ # start of string (\d\.\d{5}) # a digit, dot and 5 more digits \s, # a space and a comma (\d\d:\d\d:\d\d) # an 8 character time \s* # some spaces (.*) # everything else's a commment $ # end of string }x ) { print "$quote / $time", $comment ne '' ? " / $comment" : '', "\n"; } }

    Output:

    1.57163 / 17:29:57 / Simple Dealin 1.57163 / 17:29:57 1.57163 / 17:29:57 1.57163 / 17:29:57 1.57163 / 17:29:57 1.57163 / 17:29:57 1.57163 / 17:29:57 1.57172 / 17:30:08 1.57176 / 17:30:10

    Update:

    See perlre and perlretut for the details

        You've missed part of the regexp out (the bit that captures comments) and missed a backslash out (from \d{5}). It looks like you don't know that, in a regexp, parentheses capture their matches into $1, $2, $3 etc. Again, see perlretut and perlre for the details.

        Your code is similar to mine. You use

        while ( ... ) { unless ( some-condition ) { next } some-code }

        while I prefer the equivalent

        while ( ... ) { if ( some-condition ) { some-code } }

        it's just that (IMHO) yours is harder to read (and longer, too)

        That said, you can use my code with a filehandle like so (I've rearranged it a bit to use unless and made the regexp a more lenient towards spaces)...

        while ( <DATAFILE> ) { chomp; unless ( m{^ (\d\.\d{5}) \s*,\s* (\d\d:\d\d:\d\d) \s* (.*) $ }x ) +{ next } my ( $quote, $time, $comment ) = ( $1, $2, $3 ); # captures my ( $hours, $minutes, $seconds ) = split /:/, $time; #do something with $quote, $hours, $minutes, $seconds & $comment }

        In addition to the problems with your first regex,
        ($hour,$minute,$second) = split(":",$time);) should be
        ($hour,$minute,$second) = split /:/,$time;

        The pattern in split is a regex and needs slashes (or other unambiguous matched punctuation), not quotes. Note also that the last closing paren in your split is "one too many" (and thus, "wrong) and all the parens on the RHS are unnecessary.

        Subject to your taste, note that your extraction to $quote and $timecould be written

        next unless ( $data =~ /^(\d\.\d{5})\s,(\d\d:\d\d:\d\d).*/ ); $quote = $1; $time=$2;

        Update: s/not/note/ in the last narrative paragraph.

Re: handling erronous input
by swampyankee (Parson) on Apr 06, 2008 at 15:37 UTC

    What I would suggest is breaking your logic (and code) up into chunks:

    • First split the record, more or less as you're doing (I'd split on /\s*,\s*/), but that's a minor quibble).
    • Second, process any extraneous text ("Simple Dealin"), trailing whitespace, newlines, etc.
    • Make sure that the input values (why do all the entries in the first column look like π/2?) are in the expected range and form. (it looks like the value of the input must be greater than or equal to zero and less than ten)
    • Validate the time in whatever way you require

    I'm sure there's a regex that would do everything in one swell foop, but my regex mojo hasn't fully wakened yet.


    Information about American English usage here and here. Floating point issues? Please read this before posting. — emc