Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Vow Triptych

by hashED (Novice)
on Dec 30, 2008 at 15:22 UTC ( [id://733275]=CUFP: print w/replies, xml ) Need Help??

So I'm getting married in October, and I started thinking about wedding vows, and so I wanted to get a better feel for what other people spend most of their wedding vow-ing time talking about. Here's a little script that came out of that effort. It takes a text file full of wedding vows (which you'll have to provide for yourself) and prints the text's triptycs.
#!/usr/bin/perl my@wordsInOrder; while (<>) { foreach ("$_" =~ m/\w+/g) { push @wordsInOrder, lc($_); } } my$trypHash = {}; for ($i=0;$i < scalar(@wordsInOrder)-2; $i++) { $trypHash->{$wordsInOrder[$i]." ".$wordsInOrder[$i+1]." ".$wordsIn +Order[$i+2]} += 1; } my$dupeHash = {}; for ($i=0;$i < scalar(@wordsInOrder)-1; $i++) { $dupeHash->{$wordsInOrder[$i]." ".$wordsInOrder[$i+1]} += 1; } my$oneHash = {}; for ($i=0;$i < scalar(@wordsInOrder); $i++) { $oneHash->{$wordsInOrder[$i]} += 1; } foreach my$one (sort {$oneHash->{$b} <=> $oneHash->{$a}} keys %{$oneHa +sh} ) { print "$one\n"; foreach my$two (sort {$dupeHash->{$b} <=> $dupeHash->{$a}} keys %{ +$dupeHash} ) { next unless $two =~ m/^$one/; print "\t$two\n"; foreach my$three (sort {$trypHash->{$b} <=> $trypHash->{$a}} k +eys %{$trypHash} ) { next unless $three =~ m/^$two/; print "\t\t$three\n"; } } }

Replies are listed 'Best First'.
Re: Vow Triptych
by Arunbear (Prior) on Dec 30, 2008 at 18:56 UTC
    Additional simplifications are possible e.g. no need to loop over the word list three times, and why use hashrefs and make yourself do extra typing when you could use a regular hash.
    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my @wordsInOrder; while (<>) { push @wordsInOrder, split /\W+/, lc($_); } my (%single, %double, %triple); my $index = 0; foreach my $word (@wordsInOrder) { $single{$word}++; my $next_word = $wordsInOrder[$index+1]; if($next_word) { $double{"$word $next_word"}++; } my $next_next_word = $wordsInOrder[$index+2]; if($next_next_word) { $triple{"$word $next_word $next_next_word"}++; } $index++; } foreach my $singlet (sort_by_frequency(\%single)) { print "$singlet\n"; foreach my $doublet (sort_by_frequency(\%double)) { next unless $doublet =~ /^$singlet\b/; print "\t$doublet\n"; foreach my $triplet (sort_by_frequency(\%triple)) { next unless $triplet =~ /^$doublet\b/; print "\t\t$triplet\n"; } } } sub sort_by_frequency { my $h = shift; return sort { $h->{$b} <=> $h->{$a} } keys %$h; }
    This also only matches whole words rather than fragments. Best wishes with the wedding anyway!
Re: Vow Triptych
by jwkrahn (Abbot) on Dec 30, 2008 at 17:14 UTC
    my@wordsInOrder; while (<>) { foreach ("$_" =~ m/\w+/g) { push @wordsInOrder, lc($_); } }

    Wow!   You are copying $_ to a string before binding it to a match and then iterating over a list in a loop when you could just use the list directly:

    my@wordsInOrder; while (<>) { push @wordsInOrder, lc() =~ m/\w+/g; }
      Huh, didn't know you could do that. Thanks for the edge-mication.
Re: Vow Triptych
by Arunbear (Prior) on Dec 31, 2008 at 16:29 UTC
    For any comparative linguists, here is what it looks like in Python (it even works):
    import re import sys from collections import defaultdict wordsInOrder = [] for line in sys.stdin: wordsInOrder.extend( re.findall(r'\w+', line.lower()) ) single = defaultdict(int) double = defaultdict(int) triple = defaultdict(int) for i, word in enumerate(wordsInOrder): single[word] += 1 try: next_word = wordsInOrder[i+1] double[word + ' ' + next_word] += 1 next_next_word = wordsInOrder[i+2] triple[word + ' ' + next_word + ' ' + next_next_word] += 1 except: pass def sort_by_frequency(d): return sorted(d.iterkeys(), cmp = lambda x,y: cmp(d[y], d[x])) for singlet in sort_by_frequency(single): print singlet for doublet in sort_by_frequency(double): if not doublet.startswith(singlet + ' '): continue print "\t", doublet for triplet in sort_by_frequency(triple): if not triplet.startswith(doublet + ' '): continue print "\t\t", triplet
    I needed amusement ;-)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://733275]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (1)
As of 2024-04-18 23:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found