john.tm has asked for the wisdom of the Perl Monks concerning the following question:

I have a file which i run a foreach loop on, then save it. reopen it and remove some dupliacte lines based on certaian fields. how can i run this without the saving and reopening of the file in the middle section each time.

#!/usr/bin/perl use strict; use warnings; #use diagnostics; my $file = "c:\\tmp.txt"; open( my $fh, "<", $file ) or die $!; my $OUTNET = "c:\\NETtmp.txt"; open( OUTPUT, ">", "$OUTNET" ) or die $!; my @array; foreach (<$fh>) { chomp ; if ( $_ =~ m/^\s+\d/ ) { $_ =~ s/^\s+//g; $_ =~ s/\s+$//g; $_ =~ s/\s+/,/g; push @array, "s_"; print " $_ \n"; printf OUTPUT "$_ \n"; } } close OUTPUT; my $file2 = "c:\\NETtmp.txt"; my $OUTNET2 = "c:\\final.txt"; open my $in, '<', $file2 or die $!; open my $out, '>', $OUTNET2 or die $!; seek $in, 0, 0; my %hash; while (<$in>) { my $key = join ',', ( split /,/ )[ 1, 2, 3, 4 ]; printf $out $_ unless $hash{$key}++; } close $out; close $in;
  • Comment on Perl script run foreach loop and sort without having to save and reopen the filehandle each time.
  • Download Code

Replies are listed 'Best First'.
Re: Perl script run foreach loop and sort without having to save and reopen the filehandle each time. (SMoP)
by tye (Sage) on Dec 15, 2014 at 21:44 UTC

    No need for two loops at all, much less a temporary file:

    #!/usr/bin/perl use strict; use warnings; my $in = "c:/tmp.txt"; open( my $ifh, "<", $in ) or die $!; my $out = "c:/final.txt"; open( my $ofh, ">", $out ) or die $!; my %hash; while( <$ifh> ) { chomp; next if ! m/^\s+\d/; s/^\s+//g; s/\s+$//g; s/\s+/,/g; my $key = join ',', ( split /,/ )[ 1, 2, 3, 4 ]; print $ofh $_, "\n" if ! $hash{$key}++; }

    Use while(<>) not foreach(<>) as the latter loads the whole file into RAM.

    - tye        

Re: Perl script run foreach loop and sort without having to save and reopen the filehandle each time.
by toolic (Bishop) on Dec 15, 2014 at 21:04 UTC
    If your file is not huge, you can store the lines in an array instead of a temp file (UNTESTED):
    #!/usr/bin/perl use strict; use warnings; my $file = "c:\\tmp.txt"; open( my $fh, "<", $file ) or die $!; my @array; foreach (<$fh>) { chomp; if ( $_ =~ m/^\s+\d/ ) { $_ =~ s/^\s+//g; $_ =~ s/\s+$//g; $_ =~ s/\s+/,/g; push @array, "$_ \n"; } } my $OUTNET2 = "c:\\final.txt"; open my $out, '>', $OUTNET2 or die $!; my %hash; for (@array) { my $key = join ',', ( split /,/ )[ 1, 2, 3, 4 ]; print $out $_ unless $hash{$key}++; } close $out;

    Note that I used your @array, which you didn't seem to be using.

Re: Perl script run foreach loop and sort without having to save and reopen the filehandle each time.
by GotToBTru (Prior) on Dec 15, 2014 at 21:01 UTC

    Look into Tie::File as a way to manipulate the contents of your output file as an array - easy to add or change records in the file. Also, is there a reason you can't apply your final criteria the first time you write to the output file?

    As a side note, your seek $in, 0, 0 after reopening the file is superfluous. The file pointer will already be at the beginning of the file on open.

    1 Peter 4:10