I wanted to take all the email addresses in an "exclude" file out of the main (mailing list) file, I tried:
grep -vf exclude.list mail.list > new.list
It took HEAPS of memory and ran for about half an hour on my dual proc PIII 866
I thought I'd chance re-writing it in perl and it took 25 seconds to run and produce the same result !
The same thing written in a bash shell script using a for loop with a grep and checking the exit code took 4.5 minutes to run.
Long live perl !!!
#!/usr/bin/perl -w # # only-in # find lines which are in the first file, but not in the second. # use strict; die "Usage: $0 INPUT EXCLUDE\n" unless($#ARGV == 1); my $input_file = shift; my $exclude_file = shift; open (INPUT, $input_file ) || die("Can't open input file '$input_file': $!\n"); my @input = (<INPUT>); close(INPUT); open (EXCLUDE, $exclude_file ) || die("Can't open exclude file '$exclude_file': $!\n"); my @exclude = (<EXCLUDE>); close(EXCLUDE); my @good; for my $data (@input) { push (@good, $data) unless(grep /^$data$/i, @exclude); } print join("", @good);

Replies are listed 'Best First'.
Re: grep -vf exclude_file to_thin_file in perl
by serf (Chaplain) on Mar 02, 2009 at 19:21 UTC

    The way we do things changes over time...

    Today if I wrote this it would look more like:

    #!/usr/bin/perl # # only-in # find lines which are in the first file, but not in the second. # use warnings; use strict; my ($input_file, $exclude_file) = (shift, shift); die "Usage: $0 INPUT EXCLUDE\n" if ! $exclude_file; my %exclude; open (my $exclude_fh, $exclude_file ) || die "Can't read exclude file '$exclude_file': $!\n"; while (defined(my $exclude = <$exclude_fh>)) { $exclude{$exclude} = 1; } close $exclude_fh; open (my $input_fh, $input_file) || die "Can't read input file '$input_file': $!\n"; while (defined(my $input = <$input_fh>)) { print $input if ! $exclude{$input}; } close $input_fh;
    Which would run even faster, and wouldn't do nasty things like:
    $ cat file1 $i++; $ only-in file1 file2 Nested quantifiers in regex; marked by <-- HERE in m/^$i++ <-- HERE ; $/ at /home/qmechix/only-in line 27.
    :o)