how to avoid opening and closing files

sagar_qwerty has asked for the wisdom of the Perl Monks concerning the following question:

I have script like this:

open(INFILE,"<","words.txt")

my @X;
while(<INFILE>){
   push @X,(split(/\s+/,$_))[0]; ### choosing particluar column and pu
+shing all its values in single array
  }

$a = 0;
$main = 'temp.txt'; 
$mod = 'temp_mod.txt';
for (0..$#X)

{

$b = $main;
open b ;
open NEWFILE1, ">$mod" ;
while (<b>) { / $X[$a]   / or print NEWFILE1 }
close NEWFILE1;
$main = $mod;
$mod = $b;
$a++;
}
[download]

What it is doing: it is removing lines having "words" in single column in words.txt file. Which are stored in array @X. It opens file temp.txt, removes first word and save it in file temp_mod.txt. this later file is used again to remove second word from @X array and this loop runs by opening and closing temp.txt and temp_mod.txt

I have lot many words and many lines in file (60mb). So opening, removing line, closing, opening cycle consumes lot of time. Can I just open once a file and remove all lines together having words stored in words.txt

Comment on how to avoid opening and closing files Download Code

Replies are listed 'Best First'.
Re: how to avoid opening and closing files by davido (Cardinal) on Jun 18, 2012 at 05:29 UTC
It is inefficient to re-write your entire target file once for each "drop word". Luckily, there is a better algorithm; read your 'drop-words' into a hash, use the hash as a lookup table, then run through the words in your 'temp.txt' file one time. Every time you find that a word in the 'temp.txt' file exists within your hash, drop the line and move onto the next. Any line where you don't come across a drop-word, print the line to a new file. `use strict; use warnings; use autodie; use List::MoreUtils qw( any ); my %drop_words; open my $words_ifh, '<', 'words.txt'; while( <$words_ifh> ) { $drop_words{ ( split /\s+/, $_, 2 )[0] } = 1; } close $words_ifh; open my $temp_ifh, '<', 'temp.txt'; open my $result_ofh, '>', 'temp_mod.txt'; while( <$temp_ifh> ) { chomp; next if any { exists $drop_words{$_} } split /\s+/; print {$result_ofh} $_, "\n"; } close $temp_ifh; close $result_ofh;` [download] If you're not interested in using the non-core module List::MoreUtils, you could achieve about the same goal by changing line 21 to look like this: `next if defined first { exists $drop_words{$_} } split /\s+/;` [download] ...and replacing line 4 with `use List::Util qw(first);` (a core module). Dave	[reply] [d/l] [select]
Re: how to avoid opening and closing files by zentara (Cardinal) on Jun 18, 2012 at 10:55 UTC
Can I just open once a file and remove all lines together having words stored in words.txt Also see Re: Search Replace String Not Working on text file. You can open a file just once, then truncate and rewrite it. Of course this requires you save your output temporarily in an array. For a 60 mb file, you might want to use @ARGV's special line-by-line in-place editing capability, as also shown in the link. That would save you creating a big array. I'm not really a human, but I play one on earth. Old Perl Programmer Haiku ................... flash japh	[reply]
Re: how to avoid opening and closing files by cheekuperl (Monk) on Jun 18, 2012 at 04:01 UTC
1. Read words from words.txt file into @X. 2. Go through temp.txt and remove any lines that contain any of the words present in @X. This is all you are doing, right? The following code is untested. `open (TEMP,"<temp.txt"); open (TEMP_MOD,">temp_mod.txt"); while($line=<TEMP>) { $flag=0; foreach $word (@X) { if($line=~/$word/) #Does $line contain this $word? { $flag++; last; } } if($flag==0)#None of the words from @X is present in $line { print TEMP_MOD $line ; } } close TEMP; close TEMP_MOD;` [download] You can also split the line read from TEMP into an array (say @arr) and then apply the logic of intersection of array elements given in Programming Perl for arrays @X and @arr. If you get any elements in the interesection set, don't write the line into TEMP_MOD.	[reply] [d/l]
Re: how to avoid opening and closing files by pvaldes (Chaplain) on Jun 18, 2012 at 18:19 UTC
Can I just open once a file and remove all lines together having words stored in words.txt? See map and grep (and specially "grep (!/regex/) the_file", where regex should match words stored in words.txt `my @X; while(<INFILE>){ push @X,(split(/\s+/,$_))[0];} ### choosing particular column and pushing all its values in single ar +ray` [download] I guess that a hash (%X) could be better here, but if you want an array, think in an unique sorted list or so. The idea is to avoid duplicates.	[reply] [d/l]