Re^4: Words in Words

Replies are listed 'Best First'.
Re^5: Words in Words by sarchasm (Acolyte) on Sep 30, 2011 at 23:18 UTC
It looks like both solutions will work! One thing I just realized from your post about sorting is that you only need to look at words that are longer than the current word (which you are sortof doing). This means that as the program runs, it actually becomes faster at finding the results. I ran each program for 1 minute and BrowserUk's code produced 320 records. Lotus1's code produced 150. Even though your code appears to run slower I imagine performance will improve the longer the process runs because it will have fewer records to look through each time. I will let the programs run over the weekend to see what I get. Thank you all for your help. I learned a lot from your examples and suggestions!	[reply]
Re^6: Words in Words (Updated) by BrowserUk (Patriarch) on Oct 01, 2011 at 10:54 UTC
Update: Evidently this is a step too far as it produces the wrong results. It could (probably) be fixed, but it will never beat choroba's solution below. My final offering. ~~Combining Lotus1's sort by length with my big-string approach and this really flies, beating my previous best by an order of magnitude:~~ Ignore! <Reveal this spoiler or all in this thread> Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re^7: Words in Words by choroba (Cardinal) on Oct 01, 2011 at 13:43 UTC
I slightly modified my script: #!/usr/bin/perl use feature 'say'; use warnings; use strict; my $file = 'words.txt'; open my $IN, '<', $file or die "$!"; my %words; while (my $word = <$IN>) { chomp $word; undef $words{$word}; } my %reported; for my $word (keys %words) { my $length = length $word; for my $pos (0 .. $length - 1) { my $skip_itself = ! $pos; for my $len (1 .. $length - $pos - $skip_itself) { my $subword = substr($word, $pos, $len); next if exists $reported{$subword}; next if $word eq $subword . q{s} or $word eq $subword . q{'s}; if (exists $words{$subword}) { say "$subword"; undef $reported{$subword}; } } } } [download] I used `english.0` from this archive as words.txt: http://downloads.sourceforge.net/wordlist/ispell-enwl-3.1.20.zip. Your script took 58s, whilest mine only 6s (on Pentium 4, 2.8 GHz). The results were different, though: your output contains the word `indistinguishableness` that mine does not; my list contained 911 more words than yours (e.g. `you`, `wraps` or `tribe's`).	[reply] [d/l] [select]
Re^8: Words in Words by BrowserUk (Patriarch) on Oct 01, 2011 at 19:01 UTC
Re^8: Words in Words by BrowserUk (Patriarch) on Oct 01, 2011 at 14:55 UTC
Re^9: Words in Words by choroba (Cardinal) on Oct 01, 2011 at 19:09 UTC
Some notes below your chosen depth have not been shown here
Re^8: Words in Words by sarchasm (Acolyte) on Oct 02, 2011 at 20:28 UTC
Re^9: Words in Words by choroba (Cardinal) on Oct 02, 2011 at 20:55 UTC
Some notes below your chosen depth have not been shown here
Re^6: Words in Words by BrowserUk (Patriarch) on Oct 01, 2011 at 00:40 UTC
Another tweak should improve performance again: `#! perl -slw use strict; my @words = do{ local @ARGV = 'words.txt'; <> }; chomp @words; my $all = join ' ', @words; my $start = time; for my $i ( @words ) { while( $all =~ m[ ([^ ]$i[^ ]) ]g ) { my $j = $1; next if $j eq $i or $j eq "${i}s" or $j eq "${i}'s"; print "$j contains $i"; last; ## Added } } printf STDERR "Took %d seconds for %d words\n", time() - $start, scalar @words;` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]


Welcome to the Monastery
	PerlMonks