comment on

strict is useful if you declare each variable in the tightest scope possible. Declaring all the variables at the beginning of the script just gives you global variables with all the pitfalls.

I created a hash of the stations so you can easily check whether a report for a given station was requested. On the command line, just place all the stations where you originally had one.

As you have not posted a sample of the input data and your specification is not clear on this, I do not know what to do with the last on line 30. You definitely do not want to end the loop there, because it must run for other stations yet. If it is possible for a station to be reported multiple times but you only want the first report, you can delete its entry from the hash. If you just use it to speed the processing up and you know each station is mentioned just once for the given date, you can still delete the entries and use last unless %stations to quit the loop once all the stations have been processed.

#!/usr/bin/perl
use warnings;
use strict;

my $filename = shift;
my $date     = pop;

print qq(Processing "$filename"...\n);
(my $outfile = "$date.txt") =~ s/ /-/;

open my $IN, '<', $filename or die "$filename: $!\n";

# Count the lines.
1 while <$IN>;
my $line_count = $.;
seek $IN, $. = 0, 0;

open my $OUT, '>', $outfile or die "$outfile: $!\n";

my %stations;
@stations{ @ARGV } = ();

while (<$IN>) {
   chomp;
   my $pcntg = int 100 * ($. / $line_count );
   print STDERR "$pcntg %\r";
   my ($file_date, $file_station) = (split /,/)[0, 5];
   $file_station =~ s/ //g;
   if ($file_date eq $date and exists $stations{$file_station}) {
       print $OUT "$_\n";
       print "Record written to $outfile...\n";

       delete $stations{$file_station};
       last unless %stations;

   }
}

print "\nFinished\n";
close $OUT;
[download]

Update: Added missing indices at line 27 and missing angle brackets at line 14.

Update 2: Added the last handling. Also, STDERR used to report the percentage as it is not buffered.

لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

In reply to Re: Bulk Reading and Writing of Large Text Files by choroba
in thread Bulk Reading and Writing of Large Text Files by Sterling_Malory

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.