in reply to Bulk Reading and Writing of Large Text Files

strict is useful if you declare each variable in the tightest scope possible. Declaring all the variables at the beginning of the script just gives you global variables with all the pitfalls.

I created a hash of the stations so you can easily check whether a report for a given station was requested. On the command line, just place all the stations where you originally had one.

As you have not posted a sample of the input data and your specification is not clear on this, I do not know what to do with the last on line 30. You definitely do not want to end the loop there, because it must run for other stations yet. If it is possible for a station to be reported multiple times but you only want the first report, you can delete its entry from the hash. If you just use it to speed the processing up and you know each station is mentioned just once for the given date, you can still delete the entries and use last unless %stations to quit the loop once all the stations have been processed.

#!/usr/bin/perl use warnings; use strict; my $filename = shift; my $date = pop; print qq(Processing "$filename"...\n); (my $outfile = "$date.txt") =~ s/ /-/; open my $IN, '<', $filename or die "$filename: $!\n"; # Count the lines. 1 while <$IN>; my $line_count = $.; seek $IN, $. = 0, 0; open my $OUT, '>', $outfile or die "$outfile: $!\n"; my %stations; @stations{ @ARGV } = (); while (<$IN>) { chomp; my $pcntg = int 100 * ($. / $line_count ); print STDERR "$pcntg %\r"; my ($file_date, $file_station) = (split /,/)[0, 5]; $file_station =~ s/ //g; if ($file_date eq $date and exists $stations{$file_station}) { print $OUT "$_\n"; print "Record written to $outfile...\n"; delete $stations{$file_station}; last unless %stations; } } print "\nFinished\n"; close $OUT;

Update: Added missing indices at line 27 and missing angle brackets at line 14.

Update 2: Added the last handling. Also, STDERR used to report the percentage as it is not buffered.

لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

Replies are listed 'Best First'.
Re^2: Bulk Reading and Writing of Large Text Files
by Sterling_Malory (Initiate) on May 21, 2013 at 11:13 UTC

    Thank you for the response

    What is the best way to provide you with a sample of the data?

    Each value should only occur once per date/time entry so I think what you have provided me with should work

      A good practice is to include a small sample of the data in the question (3-4 lines) enclosed in the <code> tags for easy download.
      لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

        Ok I have updated my original post with a sample.