Dictionary filter regex

Linicks has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Dictionary filter regex by Corion (Patriarch) on Nov 26, 2016 at 17:04 UTC
The easy way is to split this up into three checks: keep anything that matches `/s.h/i` Reject anything that matches `/s.s/i` Reject anything that matches `/h.*h/i` Mushing this into a single regular expression is possible, by using `[^sh]` instead of dot, but I would stay with the three checks.	[reply] [d/l] [select]
Re^2: Dictionary filter regex by LanX (Saint) on Nov 26, 2016 at 18:39 UTC
I second your approach to break up the logic into 3 regexes, but > Mushing this into a single regular expression is possible, by using `[^sh]` instead of dot Do you mean `/s[^sh]h/i` ? I doubt this, you would also need to check all characters before and after the match `/^ [^sh] s [^sh]* h [^sh]* $/xi` * Otherwise something like `"h--<s--h>--s"` should match in the middle. (Untested) I think this demonstrates well why stuffing all logic into one regex is not always a good idea, particularly inversion isn't trivial. Cheers Rolf _{(addicted to the Perl Programming Language and ☆☆☆☆ :) Je suis Charlie!} footnotes *) I just saw that tybalt89++ already posted this regex in this thread.	[reply] [d/l] [select]
Re^3: Dictionary filter regex by Corion (Patriarch) on Nov 26, 2016 at 19:25 UTC
Yes - my post would have been much clearer had I also included the anchors and the (then explicit) `.*` before and after the matches. Thank you for pointing this out and making it explicit!	[reply] [d/l]
Re^2: Dictionary filter regex by Linicks (Scribe) on Nov 26, 2016 at 17:30 UTC
Thanks to all that replied - Corion, great answer ~ I guess I am tired after a long week at work `#!/usr/bin/perl -w my @words; my $line; open (DICT, "< final.txt"); @words= <DICT>; close (DICT); foreach $line(@words) { if ($line =~ /s.h/i) { if ( ($line =~ /s.s/i) \|\| ($line =~ /h.*h/i) ) { next; } print $line; } } exit;` [download] Produces some great words ha ha! `asthmatic asthmatical asthmatically asthmatoid asthmogenic asthore asthorin astrachan astrakhan astraphobia astraphobic astrapophobia astrochronological ... crystallographically crystallography crystallophyllian crystograph ctesiphon cubbish cubbishly cuemanship cuish cultish cultishly culture shock cumshaw cunctatorship cuneoscaphoid cuproscheelite curateship curatorship curiosity killed the cat` [download] Thanks! Nick	[reply] [d/l] [select]
Re^3: Dictionary filter regex by Laurent_R (Canon) on Nov 26, 2016 at 18:46 UTC
Hi Linicks, a couple of a comments to improve your code. `open (DICT, "< final.txt");` [download] Good practices nowadays recommend to use lexical file handles and the three-argument syntax for the open built-in function (and also to check that `open` succeeded): `open my $DICT, "<", "final.txt" or die "cannot open final.txt$!";` [download] Second, if your file is large, it is a waste of resources (memory, CPU cycles and time) to store its contents into an array and then process the array, whereas you could just process directly the lines obtained from the file (unless you want to make several other searches on the same data): `open my $DICT, "<", "final.txt" or die "cannot open final.txt$!"; while (my $word = <$DICT>) { next unless $word =~ /s.h/i; next if $word =~ /s.s/i or $word =~ /h.h/i; print $word; }` [download] You could also use a series of greps to filter your data: `open my $DICT, "<", "final.txt" or die "cannot open final.txt$!"; print for grep { not /h.h/i } grep { not /s.s/i } grep /s.h/i, <$D +ICT>;` [download] or possibly only one `grep` with a composite condition. Update: fixed the typo mentioned by Linicks: `s/~=/=~/;`.	[reply] [d/l] [select]
Re^4: Dictionary filter regex by Linicks (Scribe) on Nov 26, 2016 at 21:01 UTC
Re^5: Dictionary filter regex by Laurent_R (Canon) on Nov 27, 2016 at 13:19 UTC
Re: Dictionary filter regex by tybalt89 (Monsignor) on Nov 26, 2016 at 17:07 UTC
`#!/usr/bin/perl # http://perlmonks.org/?node_id=1176603 use strict; use warnings; my @extract = grep /^[^sh]s[^sh]h[^sh]*$/i, map tr/\n//dr, <DATA>; print "@extract\n"; __DATA__ school schools hosepipes` [download]	[reply] [d/l]
Re: Dictionary filter regex by Anonymous Monk on Nov 26, 2016 at 21:21 UTC
When all you have is a hammer... `my @words = qw( clash school schools hosepipes crystallography ); print $_, "\n" for grep { tr/sh//cdr eq "sh" } @words;` [download]	[reply] [d/l]
Re^2: Dictionary filter regex by Anonymous Monk on Nov 26, 2016 at 21:45 UTC
p.s. looks like others decided that it should be case insensitive, you can use `lc` or `fc` for that `{ fc =~ tr/sh//cdr eq "sh" }` [download]	[reply] [d/l] [select]
Re: Dictionary filter regex by pryrt (Abbot) on Nov 26, 2016 at 17:07 UTC
so you want zero or more non-sh at the start of the string, then s, followed by zero or more non-sh, followed by h, followed by zero or more non-sh to the end of the string. Phrased that way, is there a solution that comes to mind? update: remove brackets	[reply]

footnotes