in reply to how to avoid opening and closing files
It is inefficient to re-write your entire target file once for each "drop word". Luckily, there is a better algorithm; read your 'drop-words' into a hash, use the hash as a lookup table, then run through the words in your 'temp.txt' file one time. Every time you find that a word in the 'temp.txt' file exists within your hash, drop the line and move onto the next. Any line where you don't come across a drop-word, print the line to a new file.
use strict; use warnings; use autodie; use List::MoreUtils qw( any ); my %drop_words; open my $words_ifh, '<', 'words.txt'; while( <$words_ifh> ) { $drop_words{ ( split /\s+/, $_, 2 )[0] } = 1; } close $words_ifh; open my $temp_ifh, '<', 'temp.txt'; open my $result_ofh, '>', 'temp_mod.txt'; while( <$temp_ifh> ) { chomp; next if any { exists $drop_words{$_} } split /\s+/; print {$result_ofh} $_, "\n"; } close $temp_ifh; close $result_ofh;
If you're not interested in using the non-core module List::MoreUtils, you could achieve about the same goal by changing line 21 to look like this:
next if defined first { exists $drop_words{$_} } split /\s+/;
...and replacing line 4 with use List::Util qw(first); (a core module).
Dave
|
|---|