in reply to reading/writing to a file

See if this works for you:

#!/usr/bin/env perl -w if(scalar(@ARGV) != 3){ die "Usage: dc inputfile.txt excludefile.txt outputfile.txt \n" ; } my $exclude = read_hash( $ARGV[ 1 ] ); my $dict = read_hash( $ARGV[ 2 ] ); open(OUT, ">>$ARGV[2]") or die "Error opening output file: $!\n"; open(INPUT, "$ARGV[0]") or die "Error opening input: $!\n"; while (<INPUT>) { @sentence = split(/\s+/); foreach $word (@sentence) { @count = split(//, $word); ### why split before removing \W ??? $word=~s/\W//g; next if @count < 4; ### short circuit (keep nesting down); don't ### need scalar next if $exclude->{ $word } || $dict->{ $word }; $dict->{ $word } = 1; print OUT "$word\n"; } } close OUT or die $!; close INPUT or die $!; sub read_hash { my $file = shift; open my $fh, $file or die "Error reading $file: $!\n";; chomp ( my @words = <$fh> ); close $fh or die "Error closing $file: $!\n"; return +{ map +( $_ => 1 ), @words }; }

Update: BTW, I neglected to mention that some aspects of your original code made no sense to me (though I didn't change them). Specifically, I think that instead of

@count = split(//, $word); ### why split before removing \W ??? $word=~s/\W//g; next if @count < 4; ### short circuit (keep nesting down); don't ### need scalar
what you want is something more like
$word =~ s/\W+//g; next if length $word < 4;

the lowliest monk

Replies are listed 'Best First'.
Re^2: reading/writing to a file
by polettix (Vicar) on Jun 18, 2005 at 20:37 UTC
    Just to make it clear that I know this:
    A premature optimisation is a Bad Thing
    Now, I'd like to oberve that you don't really need two separate hashes, but only the dictionary (or the exclude, if you prefer) one. Basically, words that were already seen (either because already present in the output file, or because you see them as you iterate over the INPUT ONE) are to be excluded, so you can mix the two sets. The consequence is two-fold:
    • you have a single hash access instead of two, which is hopefully better;
    • you don't waste resources auto-vivificating the %$exclude hash.

    Update: corrected the auto-vivification beasty comment. I wonder how much wrong assumptions I've got inside my head - at least the magnitude order.

    Flavio
    perl -ple'$_=reverse' <<<ti.xittelop@oivalf

    Don't fool yourself.

      I don't follow you on the autovivification point. No keys in %$exclude are autovivified in the code I posted.

      The decision to use two hashes was one of several that I made for the sake of clarity alone, since I thought that in that way it would be easier for the OP to adapt it to his/her needs. (I.e., I agree with the quote at the beginning of your post :) )

      the lowliest monk