dealing with whitespace and using chop when reading delimited files

Conal has asked for the wisdom of the Perl Monks concerning the following question:

hullo

i have files that contain data like this below

1.4567 ,11:00:00
1.4571 ,11:00:01
            ,             
1.4567 ,11:58:00rftft
1.4566 , 
1.5555 ,11:43:00
[download]

I want to disregard any whitespace and any extraneous characters that show up after the time in column 2. I also want to disregard the last character after the decimal place no rounding) in the first column ... so if i were to write the above file back .. it would look like this ..

1.456,11:00:00
1.457,11:00:01 
1.456,11:58:00 
1.555,11:43:00
[download]

a snippet of my readfile code looks like this at the minute

while (<DATAFILE>) {
       
        chomp $_;
        ($quote,$time) = split(",", $_);
        chop($quote);chop($quote);
    ($hour,$minute,$second) = split(":",$time);

}
[download]

can anyone pls help me clean up my code, to catch these anomalies in my input file? i need to run calculations of the final data

thanks.

conal.

Comment on dealing with whitespace and using chop when reading delimited files Select or Download Code

Replies are listed 'Best First'.
Re: dealing with whitespace and using chop when reading delimited files by FunkyMonk (Bishop) on Mar 12, 2008 at 20:02 UTC
A capturing regexp is probably the easiest Way To Do It: `while ( <DATA> ) { chomp; my ( $number, $time ) = m{ (\d\.\d{3}) # a number with 3 decimal places .? , \s # some characters, a comma and some spaces (\d\d:\d\d:\d\d) # HH:MM:SS }x; print "$number,$time\n" if defined $number && defined $time; } __DATA__ 1.4567 ,11:00:00 1.4571 ,11:00:01 , 1.4567 ,11:58:00rftft 1.4566 , 1.5555 ,11:43:00` [download] Output: `1.456,11:00:00 1.457,11:00:01 1.456,11:58:00 1.555,11:43:00` [download]	[reply] [d/l] [select]
Re: dealing with whitespace and using chop when reading delimited files by pc88mxer (Vicar) on Mar 12, 2008 at 20:04 UTC
I would use a regular expression: `while (<DATAFILE>) { unless (m{^(.?)\s,([\d:]+)}) { warn "$.: unrecognizable line: $_"; next; } my $data = $1; my $time = $2; # ... continue processing ... } # output follows: __END__ 1.4567,11:00:00 1.4571,11:00:01 3: unrecognizable line: , 1.4567,11:58:00 5: unrecognizable line: 1.4566 , 1.5555,11:43:00 7: unrecognizable line:` [download] Note, the regular expression is very liberal in what it will accept which is the way I like to do things. You can make it more exacting if you find it necessary.	[reply] [d/l]
Re: dealing with whitespace and using chop when reading delimited files by runrig (Abbot) on Mar 12, 2008 at 20:05 UTC
I might just use a regex: `my ( $quote, $time ) = /^\s(\d+\.\d+)\s,\s*(\d\d:\d\d:\d\d)/ or next +;` [download] or something like that.	[reply] [d/l]
Re: dealing with whitespace and using chop when reading delimited files by kyle (Abbot) on Mar 12, 2008 at 20:11 UTC
Adapting slightly my answer in another thread: `my ( $quote, $time ) = m{ \A # beginning of line ( # begin capture \d+ # digits before decimal \. # decimal point \d{3} # three digits after decimal ) # end capture [^,]* # optional non-comma stuff \s* , \s* # comma with optional spaces ( # open capture \d\d? # hours : \d\d # minutes : \d\d # seconds ) }xms;` [download] Note that this does not require or force `$time` to have a two-digit hour format. It also won't play well with decimal values that start with a sign.	[reply] [d/l] [select]
Re: dealing with whitespace and using chop when reading delimited files by Tux (Canon) on Mar 13, 2008 at 13:15 UTC
Why did noone (yet) noticed the more serious bug in this code: `($quote, $time) = split (",", $_);` The first argument to split is a regex. `chomp; my ($quote, $time) = split m/\s,\s/, $_, 2; $quote = sprintf "%.3f", $quote;` [download] Enjoy, Have FUN! H.Merijn	[reply] [d/l] [select]