Using a Range to Parse a file and pull out Data?

batcater98 has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to use a Range expression to parse through a file and extract data for each value within the Range, but only pull it for the first time it finds it in the file, record that info and start back at the top of the file for the next number in the range. I can run through the process in my mind, but can not figure out in code how to accomplish this.

Example input Data:

spokanebase,Wed 24Sep03 17:59:19,50bn4575,88383 bytes
spokanebase,Wed 24Sep03 17:59:17,49bn4196,88383 bytes
spokanebase,Wed 24Sep03 17:59:10,48bn4550,88383 bytes
memphisbase,Wed 24Sep03 17:59:08,27bn700,88383 bytes
memphisbase,Wed 24Sep03 17:59:03,26bn5444,88383 bytes
havrebase,Wed 24Sep03 17:58:46,20bn5285,88383 bytes
havrebase,Wed 24Sep03 17:58:41,19bn4594,88383 bytes
alliancebase,Wed 24Sep03 17:58:56,45bn5640,88383 bytes
havrebase,Wed 24Sep03 17:58:46,20bn5285,88383 bytes
havrebase,Wed 15Sep04 17:58:41,19bn1100,88383 bytes

Range would be tied to the ???? number after the bn on each line. So i
+f I set my range to (700 .. 1200) I am trying to parse through the fi
+le from the top (I have it sorted from newest to oldest in the file) 
+once I find the first 700 extract the "Date time portion of the line"
+ and write to a output file:

Theory - Output File From Example Data Above:

700 - 24Sep03
1100 - 15Sep04

Then start over at the top of the file and find the first 701.. So on 
+So Forth, until you reach the end of the range.
[download]

Thanks, Batcater98

Comment on Using a Range to Parse a file and pull out Data? Download Code

Replies are listed 'Best First'.
Re: Using a Range to Parse a file and pull out Data? by BrowserUk (Patriarch) on Jan 18, 2007 at 16:23 UTC
Better to avoid having to make multiple passes if you can. This seems to fit the spec as presented. #! perl -slw use strict; my( $lo, $hi ) = ( 700, 1200 ); my %seen; while( <DATA> ) { if( m[,...\s(\S+)\s\S+,..bn(\d+)] and $2 >= $lo and $2 <= $hi and not exists $seen{ $2 } ) { print "$2 - $1"; $seen{ $2 }++; } } __DATA__ spokanebase,Wed 24Sep03 17:59:19,50bn4575,88383 bytes spokanebase,Wed 24Sep03 17:59:17,49bn4196,88383 bytes spokanebase,Wed 24Sep03 17:59:10,48bn4550,88383 bytes memphisbase,Wed 24Sep03 17:59:08,27bn700,88383 bytes memphisbase,Wed 24Sep03 17:59:03,26bn5444,88383 bytes havrebase,Wed 24Sep03 17:58:46,20bn5285,88383 bytes havrebase,Wed 24Sep03 17:58:41,19bn4594,88383 bytes alliancebase,Wed 24Sep03 17:58:56,45bn5640,88383 bytes havrebase,Wed 24Sep03 17:58:46,20bn5285,88383 bytes havrebase,Wed 15Sep04 17:58:41,19bn1100,88383 bytes [download] Output: `c:\test>junk8 700 - 24Sep03 1100 - 15Sep04` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l] [select]
Re: Using a Range to Parse a file and pull out Data? by kyle (Abbot) on Jan 18, 2007 at 16:20 UTC
If your "bn" numbers are unique... `my $range_start = 700; my $range_end = 1200; my %out = (); while (<>) { if ( /bn(\d+),/ && $1 >= $range_start && $1 <= $range_end ) { my $bn = $1; my ($when) = /^[^,]+,\w+\s+(\S+)\s/; $out{$bn} = $when; } } foreach my $ranger ( sort { $a <=> $b } keys %out ) { print $ranger, ' - ', $out{$ranger}, "\n"; }` [download] If they are not unique... `my $range_start = 700; my $range_end = 1200; my %out = (); while (<>) { if ( /bn(\d+),/ && $1 >= $range_start && $1 <= $range_end ) { my $bn = $1; my ($when) = /^[^,]+,\w+\s+(\S+)\s/; push @{$out{$bn}}, $when; } } foreach my $ranger ( sort { $a <=> $b } keys %out ) { print "$ranger - "; print join ', ', @{$out{$ranger}}; print "\n"; }` [download] Totally untested and off the top of my head, of course.	[reply] [d/l] [select]
Re: Using a Range to Parse a file and pull out Data? by NiJo (Friar) on Jan 18, 2007 at 20:45 UTC
Basically you are reinventing an inefficient sorting algorithm. Be lazy. What is stopping you from re-sorting by date and the magic number? Finding the first magic number after a date change is simple and fast.	[reply]