comment on

I was always fascinated by sorting words in a dictionary starting from their end. That would be the ultimate poet's companion. Alas, pronounciation and spelling are often very different.

The following sentences were some of the 100 produced after reading Shelley's Frankestein. Thank you Shelley...

Victim Prometheus Petersburgh dark
Doubt listlessly fruitlessly grievously paroxysm streams dreams
Corpse view hardly Henry
Sex forget universal dirt
Grey dungeon peaks Even even heaven
Uri persuade made traveller dread instead read
Devil sobs papa music
Mantel autumn Autumn piece sympathy
Lie say Greece plan
Delirium Turkey occur free degree
...
Waves branch French entrench Greek doubt

The recipe for the above is:

 read a text or a dictionary
 find each word and filter out rubbish (numbers, initials, etc)
 enter the words in a hash keyed on their last N chars
 - an attempt to rhyme.
 TeX::Hyphen hyphenates the words so it's better but misses
 out a lot of words.
 create a pattern for the sentence, (0,1,0,0,1,0)
 meaning: pick random word for first word,
 second word change the ending, (1) so pick up
 a random word with different ending.
 3rd word continue with same ending as previous (0)
 etc.
[download]

and the code is here (too lazy I am to tidy it up)

#!/usr/bin/env perl

# for https://perlmonks.org/?node_id=1221711

use strict;
use warnings;

select(STDERR); $| = 1;
select(STDOUT); $| = 1;

srand(1234);

# read a text or a dictionary
# find each word and filter out rubbish (numbers, initials, etc)
# enter the words in a hash keyed on their last N chars
# - an attempt to rhyme.
# TeX::Hyphen hyphenates the words so it's better but misses
# out a lot of words.
# create a pattern for the sentence, (0,1,0,0,1,0)
# meaning: pick random word for first word,
# second word change the ending, (1) so pick up
# a random word with different ending.
# 3rd word continue with same ending as previous (0)
# etc.

my $num_sentences = 100;
#my $dict_filename = '/usr/share/dict/words';
#my $dict_filename = './words';
#my $dict_filename = './dict.txt';
#my $dict_filename = './KingJamesBible.txt';
my $dict_filename = './ShelleyFrankenstein.txt';
#my $dict_filename = './wordlist';

open(my $dfh, '<', $dict_filename) || die "dict file: $!";
my %words_keyed_on_ending = ();
my ($aword, $aline);
my $progress = 1;
print "reading dict: ";
while( $aline = <$dfh> ){
        chomp($aline);
        $aline =~ s/[0-9]+\:[0-9]*//g;
        foreach my $aword (split/\W+/, $aline){
                if( $aword =~ /^[0-9A-Z\.,]+$/ ){ next }
                my $last_syl = substr($aword, -3);
                $words_keyed_on_ending{$last_syl} = [] unless exists $
+words_keyed_on_ending{$last_syl};
                push(@{$words_keyed_on_ending{$last_syl}}, $aword);
        }
        print "$progress " if ++$progress % 10000 == 0;
#       if( $progress > 100 ){ last }
}
print "\n";
my @keys_of_words_keyed_on_ending = keys %words_keyed_on_ending;
#for (sort @keys_of_words_keyed_on_ending){ print "$_ : ".join("---", 
+@{$words_keyed_on_ending{$_}})."\n"; } exit(0);

for(1..$num_sentences){
        print join(" ", @{make_sentence()})."\n";
}
exit(0);

sub     make_sentence {
        my @pat = (0,1,0,0,1,1,0,0);
        my $current_ending = $keys_of_words_keyed_on_ending[rand @keys
+_of_words_keyed_on_ending];
        my $bag_of_words = $words_keyed_on_ending{$current_ending};
        my $current_word = ucfirst $bag_of_words->[rand @$bag_of_words
+];
        my @sent = ($current_word);
        #print "current word: $current_word (ending in $current_ending
+)\n";
        #print "bag of words: ".join(", ", @$bag_of_words)."\n";
        my $last_word = $current_word;
        for(1..$#pat){
                if( $pat[$_] ){
                        #print "changing ending to ";
                        # change the ending
                        $current_ending = $keys_of_words_keyed_on_endi
+ng[rand @keys_of_words_keyed_on_ending];
                        $bag_of_words = $words_keyed_on_ending{$curren
+t_ending};
                        $current_word = $bag_of_words->[rand @$bag_of_
+words];
                } else {
                        # use same ending
                        $current_word = $bag_of_words->[rand @$bag_of_
+words];
                }
                if( $current_word ne $last_word ){
                        $last_word = $current_word;
                        push(@sent, $current_word);
                }
                #print "current word: $current_word (ending in $curren
+t_ending)\n";
                #print "bag of words: ".join(", ", @$bag_of_words)."\n
+";
        }
        #print "num words: ".scalar(@sent)."\n";
        return \@sent;
}
[download]

In reply to Re: Creating random sentences from a dictionary by bliako
in thread Creating random sentences from a dictionary by Lotus1

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.