in reply to reading a delimited file and selecting values from it

Here's a start. This is actually pretty close to the algorithm that nefigah described (though I wrote it before reading that).

use strict; use warnings; use Data::Dumper; my $last_hour = 23; my %best_of; while (<DATA>) { my ( $quote, $time ) = m{ \A # beginning of line ( [^,]+ ) # non-commas \s* , \s* # comma with optional spaces ( # open capture \d\d? # hours : \d\d # minutes : \d\d # seconds ) }xms; my ( $hour, $min, $sec ) = split /:/, $time; if ( '00' eq $sec && '00' eq $min && -1 == --$hour ) { $hour = $last_hour; } my $seconds_past = $min * 60 + $sec; if ( ! $seconds_past || $best_of{ $hour }{second} < $seconds_past ) { $best_of{ $hour } = { second => $seconds_past, time => $time, quote => $quote, }; } } print Dumper \%best_of; __DATA__ 1.53311 ,1:59:52 1.53311 ,1:59:5220 1.53311 ,1:59:52 1.53311 ,1:59:52hi 1.53311 ,2:00:00 1.53306 ,2:00:03 1.53307 ,2:00:06

Here's the output:

$VAR1 = { '1' => { 'quote' => '1.53311 ', 'time' => '2:00:00', 'second' => 0 }, '2' => { 'quote' => '1.53307 ', 'time' => '2:00:06', 'second' => 6 } };

This pops out a couple of warnings ("Use of uninitialized value in numeric lt (<)") in the last condition because it's comparing $seconds_past to an undef that gets autovivified in %best_of.

Anyway, what you end up with is a hash with each hour seen as a key. The values are hash refs that contain the data you're interested in.

Replies are listed 'Best First'.
Re^2: reading a delimited file and selecting values from it
by Conal (Beadle) on Mar 12, 2008 at 02:55 UTC
    great, thats fantastic Kyle..

    the data will be a lot more useful to have the values in some kind of array cos i plan to manipulate it further later .. your code with be invaluable to me as a base structure.

    can i just ask , how i get the time variable to deal with phantom extra digits? e.g 1:59:52hi

    how do i get it to disregard the extraneous data at the end? thanks again

      The pattern I used already does that, as written. The pattern matches everything you want, up to the extraneous data, and that's where it stops.

      The problem with it (if you consider this a problem) is that the loop doesn't notice if there's a non-match. If you have some bogus line in the file, it's going to try to use it anyway. This will probably manifest as an undef quote at midnight. That's part of why I said it's a start.