in reply to Re^2: Dictionary filter regex
in thread Dictionary filter regex

Hi Linicks,

a couple of a comments to improve your code.

open (DICT, "< final.txt");
Good practices nowadays recommend to use lexical file handles and the three-argument syntax for the open built-in function (and also to check that open succeeded):
open my $DICT, "<", "final.txt" or die "cannot open final.txt$!";
Second, if your file is large, it is a waste of resources (memory, CPU cycles and time) to store its contents into an array and then process the array, whereas you could just process directly the lines obtained from the file (unless you want to make several other searches on the same data):
open my $DICT, "<", "final.txt" or die "cannot open final.txt$!"; while (my $word = <$DICT>) { next unless $word =~ /s.*h/i; next if $word =~ /s.*s/i or $word =~ /h.*h/i; print $word; }
You could also use a series of greps to filter your data:
open my $DICT, "<", "final.txt" or die "cannot open final.txt$!"; print for grep { not /h.*h/i } grep { not /s.*s/i } grep /s.*h/i, <$D +ICT>;
or possibly only one grep with a composite condition.

Update: fixed the typo mentioned by Linicks: s/~=/=~/;.

Replies are listed 'Best First'.
Re^4: Dictionary filter regex
by Linicks (Scribe) on Nov 26, 2016 at 21:01 UTC

    Thanks, interesting. I see you also have the issue I have - _differnet_ typos ~

    next if $word ~=

    Heh.

    My original code produces:

    time perl sh.pl > sh.txt real 0m3.770s user 0m3.692s sys 0m0.074s

    ...and using the great sort of 4 liner while loop:

    time perl sh.pl > sh.txt real 0m4.192s user 0m4.170s sys 0m0.015s

    seems slower.

    Also, as to the error on open a file, I never bother when doing it in a terminal on a local machine as I know the file exists - in other circumstances I would, of course.

    Thanks for your input!

    Nick

    P.S. The first word my dictionary file pulls up is abandon ship

      I have different results. With a words.txt file containing about 113,800 words, these are my timings. With an array:
      $ time perl -e 'open my $fh, "<", "words.txt" or die; my @w = <$fh>; > my $c; > foreach $line(@w) { > if ($line =~ /s.*h/i) { > if ( ($line =~ /s.*s/i) || ($line =~ /h.*h/i) ) { > next; > } > $c++; > } > } > print "$c \n"; > ' 2834 real 0m0.138s user 0m0.093s sys 0m0.016s
      Reading directly from the file:
      $ time perl -e 'open my $fh, "<", "words.txt" or die; > my $c; > foreach $line(<$fh>) { > if ($line =~ /s.*h/i) { > if ( ($line =~ /s.*s/i) || ($line =~ /h.*h/i) ) { > next; > } > $c++; > } > } > print "$c \n"; > ' 2834 real 0m0.120s user 0m0.093s sys 0m0.015s
      I have used exactly the same code except for the use of the array, so that the comparison should be relatively significant.

      But the main point is that if the input file gets huge, then you sometimes simply can't load it into an array, because you'll run out of memory.