in reply to Words in Words
Try this. I project that it should complete your 410 billion comparisons in a little under 10 hours.
The main attempt at efficiency here is to invoke the regex once in global mode (/g) for each word, against a single large string containing all the words and have it return all the matches. It then filters just the matching ones for your specific exclusions.
#! perl -slw use strict; my @words = do{ local @ARGV = 'words.txt'; <> }; chomp @words; my $all = join ' ', @words; my $start = time; my $n = 0; for my $i ( @words ) { for my $j ( $all =~ m[ ([^ ]*$i[^ ]*) ]g ) { next if $j eq $i or $j eq "${i}s" or $j eq "${i}'s"; # print "$j contains $i"; } } printf STDERR "Took %d seconds for %d words\n", time() - $start, scalar @words;
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Words in Words
by Lotus1 (Vicar) on Sep 30, 2011 at 21:27 UTC | |
by BrowserUk (Patriarch) on Sep 30, 2011 at 21:53 UTC | |
by Lotus1 (Vicar) on Sep 30, 2011 at 22:43 UTC | |
by sarchasm (Acolyte) on Sep 30, 2011 at 23:18 UTC | |
by BrowserUk (Patriarch) on Oct 01, 2011 at 10:54 UTC | |
| |
by BrowserUk (Patriarch) on Oct 01, 2011 at 00:40 UTC | |
by sarchasm (Acolyte) on Sep 30, 2011 at 22:03 UTC | |
by BrowserUk (Patriarch) on Sep 30, 2011 at 22:17 UTC | |
by sarchasm (Acolyte) on Sep 30, 2011 at 21:49 UTC |