Conal has asked for the wisdom of the Perl Monks concerning the following question:

hullo

i have files that contain data like this below

1.4567 ,11:00:00 1.4571 ,11:00:01 , 1.4567 ,11:58:00rftft 1.4566 , 1.5555 ,11:43:00
I want to disregard any whitespace and any extraneous characters that show up after the time in column 2. I also want to disregard the last character after the decimal place no rounding) in the first column ... so if i were to write the above file back .. it would look like this ..
1.456,11:00:00 1.457,11:00:01 1.456,11:58:00 1.555,11:43:00
a snippet of my readfile code looks like this at the minute
while (<DATAFILE>) { chomp $_; ($quote,$time) = split(",", $_); chop($quote);chop($quote); ($hour,$minute,$second) = split(":",$time); }
can anyone pls help me clean up my code, to catch these anomalies in my input file? i need to run calculations of the final data

thanks.

conal.

Replies are listed 'Best First'.
Re: dealing with whitespace and using chop when reading delimited files
by FunkyMonk (Bishop) on Mar 12, 2008 at 20:02 UTC
    A capturing regexp is probably the easiest Way To Do It:
    while ( <DATA> ) { chomp; my ( $number, $time ) = m{ (\d\.\d{3}) # a number with 3 decimal places .*? , \s* # some characters, a comma and some spaces (\d\d:\d\d:\d\d) # HH:MM:SS }x; print "$number,$time\n" if defined $number && defined $time; } __DATA__ 1.4567 ,11:00:00 1.4571 ,11:00:01 , 1.4567 ,11:58:00rftft 1.4566 , 1.5555 ,11:43:00

    Output:

    1.456,11:00:00 1.457,11:00:01 1.456,11:58:00 1.555,11:43:00

Re: dealing with whitespace and using chop when reading delimited files
by pc88mxer (Vicar) on Mar 12, 2008 at 20:04 UTC
    I would use a regular expression:
    while (<DATAFILE>) { unless (m{^(.*?)\s*,([\d:]+)}) { warn "$.: unrecognizable line: $_"; next; } my $data = $1; my $time = $2; # ... continue processing ... } # output follows: __END__ 1.4567,11:00:00 1.4571,11:00:01 3: unrecognizable line: , 1.4567,11:58:00 5: unrecognizable line: 1.4566 , 1.5555,11:43:00 7: unrecognizable line:

    Note, the regular expression is very liberal in what it will accept which is the way I like to do things. You can make it more exacting if you find it necessary.

Re: dealing with whitespace and using chop when reading delimited files
by runrig (Abbot) on Mar 12, 2008 at 20:05 UTC
    I might just use a regex:
    my ( $quote, $time ) = /^\s*(\d+\.\d+)\s*,\s*(\d\d:\d\d:\d\d)/ or next +;
    or something like that.
Re: dealing with whitespace and using chop when reading delimited files
by kyle (Abbot) on Mar 12, 2008 at 20:11 UTC

    Adapting slightly my answer in another thread:

    my ( $quote, $time ) = m{ \A # beginning of line ( # begin capture \d+ # digits before decimal \. # decimal point \d{3} # three digits after decimal ) # end capture [^,]* # optional non-comma stuff \s* , \s* # comma with optional spaces ( # open capture \d\d? # hours : \d\d # minutes : \d\d # seconds ) }xms;

    Note that this does not require or force $time to have a two-digit hour format. It also won't play well with decimal values that start with a sign.

Re: dealing with whitespace and using chop when reading delimited files
by Tux (Canon) on Mar 13, 2008 at 13:15 UTC

    Why did noone (yet) noticed the more serious bug in this code:

     ($quote, $time) = split (",", $_);

    The first argument to split is a regex.

    chomp; my ($quote, $time) = split m/\s*,\s*/, $_, 2; $quote = sprintf "%.3f", $quote;

    Enjoy, Have FUN! H.Merijn