in reply to minimal response program code problem

Hmm, well, I've had a go at re-writing your code.

I won't claim that this is the 'best' code - or that I'd have written it this way if I'd started from scratch, but I think it does what you want (it also counts 'bad words', since it occurred to me that there's no point in counting good words unless you've got some kind of ratio to compare them to).

use strict; my %goodwords = ("mhm" => 1, "right" => 1, "well" => 1, "yeah" => 1, "sure" => 1, "good" => 1, "ah" => 1, "okay" => 1, "yep" => 1, "hm" => 1, "definitely" => 1, "alright" => 1, "'m'm" => 1, "oh" => 1, "my" => 1, "god" => 1, "wow" => 1, "uhuh" => 1, "exactly" => 1, "yup" => 1, "mkay" => 1, "i see" => 1, "ooh" => 1, "cool" => 1, "uh" => 1, "fine" => 1, "true" => 1, "hm'm" => 1, "hmm" => 1, "yes" => 1, "absolutely" => 1, "great" => 1, "um" => 1, "so" => 1, "mm" => 1, "weird" => 1, "ye-" => 1, "i mean" => 1, "i know" => 1, "i think so" => 1, "huh" => 1, "yay" => 1, "maybe" => 1, "eh" => 1, "obviously" => 1, "correct" => 1, "awesome" => 1, "really" => 1, "interesting" => 1,); my(%speaker_record); # store the info in this hash in an array ref my $gender = 0; # array number for gender my $matched_words = 1; # array number for matched words count my $unmatched_words = 2; # array number for unmatched words count while(<DATA>){ if(/<strong>(S[\w\-]+)<\/strong>:.*Gender:\s+(Male|Female)/i){ $speaker_record{$1}->[$gender]=$2; $speaker_record{$1}->[$matched_words]=0; $speaker_record{$1}->[$unmatched_words]=0; } else{ # hopefully, a chunk contains just the stuff attributed to one spe +aker (split on <b>) my @chunks = split /<b>/, $_; foreach my $chunk(@chunks){ if($chunk =~ /(\w+?):/){ # who is the speaker of this chunk? my $speaker = $1; # get rid of stuff we don't want to count $chunk =~ s/<.*?>//g; # html tags and content $chunk =~ s/\[|\]//g; # '[' and ']' $chunk =~ s/(\w+?)://g; # the speaker my @words = split /\s+/, $chunk; # break the chunk up into wor +ds foreach my $word(@words){ #non-blank 'word' and valid speaker if($word !~ /^\s*$/ and exists $speaker_record{$speaker}){ # a matched goodword if(exists $goodwords{$word}){ $speaker_record{$speaker}->[$matched_words] ++; } # an unmatched word else{ $speaker_record{$speaker}->[$unmatched_words] ++; } } } } } } } foreach(keys %speaker_record){ print "Speaker: $_, Gender: $speaker_record{$_}->[$gender], "; print "Matched words: $speaker_record{$_}->[$matched_words], "; print "Unmatched words: $speaker_record{$_}->[$unmatched_words]\n"; } __DATA__ <strong>S1</strong>: Native-Speaker Status: Native speaker, American E +nglish; Academic Role: Senior Undergraduate; Gender: Male; Age: 17-23 +; Restriction: None<br> <strong>S2</strong>: Native-Speaker Status: Native speaker, American E +nglish; Academic Role: Researcher; Gender: Male; Age: 31-50; Restrict +ion: Cite<br> <strong>S3</strong>: Native-Speaker Status: Native speaker, American E +nglish; Academic Role: Junior Undergraduate; Gender: Female; Age: 17- +23; Restriction: None<br> <strong>S4</strong>: Native-Speaker Status: Native speaker, American E +nglish; Academic Role: Senior Undergraduate; Gender: Female; Age: 17- +23; Restriction: None<br> <strong>S5</strong>: Native-Speaker Status: Native speaker, American E +nglish; Academic Role: Junior Undergraduate; Gender: Female; Age: 17- +23; Restriction: None<br> <strong>SS</strong>: Native-Speaker Status: Native speaker, American E +nglish; Academic Role: Unknown; Gender: Male; Age: Unknown; Restricti +on: None<br> <p><b>S1: </b> it was presented to them by Chuck D and Public Enemy. +<font color="#ff6600"><b> [S2: </b> mhm <b> ] </b></font> and the re +st of th- Public Enemy and you know and and Chuck D's f- publicly get +s up and says you know they were with us from the beginning and, <fo +nt color="#ff6600"><b> [S2: </b> <font color="#3333ff"> mhm </font> +<b> ] </b></font> <font color="#3333ff"> all that </font> now wheth- +whether or not you know that he was reading a TelePrompTer, <font co +lor="#ff6600"><b> [S2: </b> mhm <b> ] </b></font> or or not i i thin +k is uh </p> <p><b>S2: </b> or if he was trying to make nice because of the fact th +at Public Enemy hasn't sold records lately, <font color="#ff6600"><b +> [S1: </b> right <b> ] </b></font> and he doesn't wanna look like s +ome kinda old sourpuss </p>
Tom Melly, pm@tomandlu.co.uk

Replies are listed 'Best First'.
Re^2: minimal response program code problem
by Not_a_Number (Prior) on Dec 05, 2006 at 17:16 UTC

    There is a slight problem with this approach: it doesn't pick up 'multiword' items from %goodwords ("i see", "i mean", "i know", "i think so").

      Oops - very true... hmm, I guess I should loop through the goodwords keys and test each one against the chunk... although, given, in this example, very few of the goodwords are multi-word, it might be quicker to treat those as the exceptions and check for them seperately.

      map{$a=1-$_/10;map{$d=$a;$e=$b=$_/20-2;map{($d,$e)=(2*$d*$e+$a,$e**2 -$d**2+$b);$c=$d**2+$e**2>4?$d=8:_}1..50;print$c}0..59;print$/}0..20
      Tom Melly, pm@tomandlu.co.uk