in reply to minimal response program code problem

Given that you are parsing HTML I'd strongly recommend using a module such as HTML::TreeBuilder to do the heavy lifting for you. Consider the following:

use strict; use warnings; use HTML::TreeBuilder; my @goodWordsList = ( "mhm", "right", "well", "yeah", "sure", "good", "ah", "okay", "yep +", "hm", "definitely", "alright", "'m'm", "oh", "my", "god", "wow", "uhuh", + "exactly", "yup", "mkay", "i see", "ooh", "cool", "uh", "fine", "true", "hm'm +", "hmm", "yes", "absolutely", "great", "um", "so", "mm", "weird", "ye-", "i + mean", "i know", "i think so", "huh", "yay", "maybe", "eh", "obviously", +"correct", "awesome", "really", "interesting", ); my %goodwords; @goodwords{@goodWordsList} = (1) x @goodWordsList; my $root = HTML::TreeBuilder->new (); $root->parse_file (*DATA); my %speakers; # Parse out speaker attributes for ($root->look_down ('_tag', 'strong')) { my $info = $_->right (); my $name = $_->as_text (); $speakers{$name}{info} = $info; for my $param (split /\s*(?:;\s*|$)/, $info) { my ($key, $value) = $param =~ /^:?\s*([^:]*):\s*(.*)/; $speakers{$name}{$key} = $value; } } my %stats; # Do the analysis for ($root->look_down ('_tag', 'p')) { my $line = $_->as_text ();; my ($name) = $line =~ /(\w+):/; # Preform analysis on paragraph here }
__DATA__ <strong>S1</strong>: Native-Speaker Status: Native speaker, American E +nglish; Academic Role: Senior Undergraduate; Gender: Male; Age: 17-23 +; Restriction: None<br> <strong>S2</strong>: Native-Speaker Status: Native speaker, American E +nglish; Academic Role: Researcher; Gender: Male; Age: 31-50; Restrict +ion: Cite<br> <strong>S3</strong>: Native-Speaker Status: Native speaker, American E +nglish; Academic Role: Junior Undergraduate; Gender: Female; Age: 17- +23; Restriction: None<br> <strong>S4</strong>: Native-Speaker Status: Native speaker, American E +nglish; Academic Role: Senior Undergraduate; Gender: Female; Age: 17- +23; Restriction: None<br> <strong>S5</strong>: Native-Speaker Status: Native speaker, American E +nglish; Academic Role: Junior Undergraduate; Gender: Female; Age: 17- +23; Restriction: None<br> <strong>SS</strong>: Native-Speaker Status: Native speaker, American E +nglish; Academic Role: Unknown; Gender: Male; Age: Unknown; Restricti +on: None<br> <p><b>S1: </b> it was presented to them by Chuck D and Public Enemy. +<font color="#ff6600"><b> [S2: </b> mhm <b> ] </b></font> and the re +st of th- Public Enemy and you know and and Chuck D's f- publicly get +s up and says you know they were with us from the beginning and, <fo +nt color="#ff6600"><b> [S2: </b> <font color="#3333ff"> mhm </font> +<b> ] </b></font> <font color="#3333ff"> all that </font> now wheth- +whether or not you know that he was reading a TelePrompTer, <font co +lor="#ff6600"><b> [S2: </b> mhm <b> ] </b></font> or or not i i thin +k is uh </p> <p><b>S2: </b> or if he was trying to make nice because of the fact t +hat Public Enemy hasn't sold records lately, <font color="#ff6600">< +b> [S1: </b> right <b> ] </b></font> and he doesn't wanna look like +some kinda old sourpuss </p>

which parses out all of the speaker attributes into %speakers, then iterates over the paragraphs pulling out speaker names and doing whatever arcane thing it is you need to do for each paragraph. Note that there is a lot of error checking not done. If the structure of the text differs from the sample then you will most likely get run time errors and warnings. On the other hand, your current parsing is much more fragile (actually, broken even).


DWIM is Perl's answer to Gödel