Re^3: Dictionary filter regex

a couple of a comments to improve your code.

open (DICT, "< final.txt");
[download]

Good practices nowadays recommend to use lexical file handles and the three-argument syntax for the open built-in function (and also to check that open succeeded):

open my $DICT, "<", "final.txt" or die "cannot open final.txt$!";
[download]

Second, if your file is large, it is a waste of resources (memory, CPU cycles and time) to store its contents into an array and then process the array, whereas you could just process directly the lines obtained from the file (unless you want to make several other searches on the same data):

open my $DICT, "<", "final.txt" or die "cannot open final.txt$!";
while (my $word = <$DICT>) {
    next unless $word =~ /s.*h/i;
    next if $word =~ /s.*s/i or $word =~ /h.*h/i;
    print $word;
}
[download]

You could also use a series of greps to filter your data:

open my $DICT, "<", "final.txt" or die "cannot open final.txt$!";
print for grep { not /h.*h/i } grep { not /s.*s/i }  grep /s.*h/i, <$D
+ICT>;
[download]

or possibly only one grep with a composite condition.

Update: fixed the typo mentioned by Linicks: s/~=/=~/;.

Comment on Re^3: Dictionary filter regex Select or Download Code

Replies are listed 'Best First'.
Re^4: Dictionary filter regex by Linicks (Scribe) on Nov 26, 2016 at 21:01 UTC
Thanks, interesting. I see you also have the issue I have - _differnet_ typos ~ `next if $word ~=` Heh. My original code produces: `time perl sh.pl > sh.txt real 0m3.770s user 0m3.692s sys 0m0.074s` [download] ...and using the great sort of 4 liner while loop: `time perl sh.pl > sh.txt real 0m4.192s user 0m4.170s sys 0m0.015s` [download] seems slower. Also, as to the error on open a file, I never bother when doing it in a terminal on a local machine as I know the file exists - in other circumstances I would, of course. Thanks for your input! Nick P.S. The first word my dictionary file pulls up is abandon ship	[reply] [d/l] [select]
Re^5: Dictionary filter regex by Laurent_R (Canon) on Nov 27, 2016 at 13:19 UTC
I have different results. With a words.txt file containing about 113,800 words, these are my timings. With an array: `$ time perl -e 'open my $fh, "<", "words.txt" or die; my @w = <$fh>; > my $c; > foreach $line(@w) { > if ($line =~ /s.h/i) { > if ( ($line =~ /s.s/i) \|\| ($line =~ /h.h/i) ) { > next; > } > $c++; > } > } > print "$c \n"; > ' 2834 real 0m0.138s user 0m0.093s sys 0m0.016s` [download] Reading directly from the file: `$ time perl -e 'open my $fh, "<", "words.txt" or die; > my $c; > foreach $line(<$fh>) { > if ($line =~ /s.h/i) { > if ( ($line =~ /s.s/i) \|\| ($line =~ /h.h/i) ) { > next; > } > $c++; > } > } > print "$c \n"; > ' 2834 real 0m0.120s user 0m0.093s sys 0m0.015s` [download] I have used exactly the same code except for the use of the array, so that the comparison should be relatively significant. But the main point is that if the input file gets huge, then you sometimes simply can't load it into an array, because you'll run out of memory.	[reply] [d/l] [select]