Re: Removing all none 8 letter words from the dict/words file
by NetWallah (Canon) on Apr 09, 2006 at 01:50 UTC
|
Perhaps something like this one-liner :
perl -ne "m/^\w{8}$/ and print" InputfileName
You could redirect STDOUT, if you need to save the file.
"For every complex problem, there is a simple answer ... and it is wrong." --H.L. Mencken
| [reply] [d/l] |
|
My initial reaction was that this was going to be less efficient than the length check, but my second reaction is that this ensures that all 8 characters are perl word characters and not hyphens, periods, apostrophes, etc. -- many of which are in the typical dictionary file.
Checking length == 8 && ! /\A\w{8}\z/ found 2400 entries on my /usr/share/dict/words file.
-xdg
Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.
| [reply] [d/l] [select] |
Re: Removing all none 8 letter words from the dict/words file
by jZed (Prior) on Apr 09, 2006 at 01:37 UTC
|
I'm guessing by "non 8 letter words" you mean all words that don't have exactly 8 characters? If so, open an output file and loop through the input file doing this inside the loop: print OUTFILE $_ if length $_ != 8;
update oh wait, you want to remove everything except 8-letter words, (or actually create a new file that doesn't contain them). For that you'd do as above but change != to ==.
| [reply] |
Re: Removing all none 8 letter words from the dict/words file
by Cody Pendant (Prior) on Apr 09, 2006 at 08:59 UTC
|
Aren't all the length() based solutions forgetting to chomp()? Isn't Silver Wolf going to end up with seven-letter words that way, or maybe six if it's CRLF?
($_='kkvvttuu bbooppuuiiffss qqffssmm iibbddllffss')
=~y~b-v~a-z~s; print
| [reply] |
|
$ perl -nle 'print if length == 8 && ! /\A\w{8}\z/' /usr/share/dict/wo
+rds
10-point
11-point
12-point
16-point
18-point
20-point
48-point
-ability
Abu-Bekr
acantho-
[snip]
-xdg
Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.
| [reply] [d/l] [select] |
Re: Removing all none 8 letter words from the dict/words file
by salva (Canon) on Apr 09, 2006 at 09:17 UTC
|
grep '^........$' /usr/share/dict/words
| [reply] [d/l] |
|
| [reply] [d/l] [select] |
|
actually, on my linux box, grep is GNU grep, and egrep is a shell wrapper for grep:
#!/bin/sh
exec grep -E ${1+"$@"}
| [reply] [d/l] [select] |
|
Re: Removing all none 8 letter words from the dict/words file
by /Silver_Wolf (Novice) on Apr 09, 2006 at 16:34 UTC
|
| [reply] [d/l] |
Re: Removing all non 8 letter words from the dict/words file
by /Silver_Wolf (Novice) on Apr 09, 2006 at 23:00 UTC
|
| [reply] |
|
while (<>)
{ print if /^[^A-Z0-9.-]+$/; }
However, it's usually much easier to list the things you can include.
-QM
--
Quantum Mechanics: The dreams stuff is made of
| [reply] [d/l] |
|
| [reply] |
|
|
|