in reply to Why doesn't the whole line print?

Ok here's the code...
#!/g/rcs/sw/bin/perl -w # # Create clusters of meeting data in a semi-unsupervised fashion # # parameters: # 1) input directory # 2) output data #################################################################### use strict; my $indir = $ARGV[0]; my $outfile = $ARGV[1]; opendir(INDIR, $indir) || die "directory open failed"; open(OUTPUT, ">$outfile"); my @files = readdir(INDIR); #get all the files in the directory shift @files; shift @files; #shift off '.' and '..' foreach my $file (@files) { #process files until none are left my $slash = '/'; my $infile = $indir.$slash.$file; open(INPUT, $infile) || die "error on file open"; my $within_spurt=0; my $previous_interupt = 0; my $first = 1; LINE: while (<INPUT>) { #inner loop to create initail clustering if( $first ) { $first = 0; # skip first line which isn't data next LINE; } my @line = split(' '); my $current_word = $line[2]; my $word_in_spurt = $line[3]; my $spurt_length = $line[4]; my $primary_speaker = $line[9]; my $interupting_speakers = $line[11]; my @current_spurt; #@debug = ($line[0], $line[1], $line[3], $line[4], $line[5]); # see if number of speakers increases and first_speaker == 0, # this means the spurt is interupting, not the primary speaker if($primary_speaker == 0 && $interupting_speakers > $previous_interupt && $word_in_spurt == 1) { push @current_spurt, "<s> $current_word ";# start marker $within_spurt = 1; if($word_in_spurt == $spurt_length) {#only word in spurt push @current_spurt, "<\/s>\n"; $within_spurt = 0; $interupting_speakers = 0; # the interupt is over } } # if we are in a spurt and it is not the last word of that spurt elsif($within_spurt && $word_in_spurt != $spurt_length) { push @current_spurt, "$current_word "; #add current word } # if this is the last word of a spurt elsif($word_in_spurt == $spurt_length && $within_spurt == 1) { push @current_spurt, "$current_word <\/s>\n"; #end marker $within_spurt = 0; $interupting_speakers = 0; # the interupt is over } #make sure that at the end of spurts this flag is reduced if($word_in_spurt == $spurt_length) { $interupting_speakers--; # should not be less than 0 if($interupting_speakers < 0) { $interupting_speakers = 0; } } $previous_interupt = $interupting_speakers; my $yeah = 0; my $string = join('',@current_spurt); if( $string =~ /\bso\b/ ) { $yeah = 1; } print STDOUT $string if $yeah; print $string; print OUTPUT @current_spurt; undef @current_spurt; undef $string; } close(INPUT); last; }
For the line :

< s > so we can do it again yeah yeah < /s >

The result is :

< s > so < s > so we can do it again yeah yeah < /s >

Replies are listed 'Best First'.
(crazyinsomniac) Re^2: Why doesn't the whole line print?
by crazyinsomniac (Prior) on Aug 18, 2001 at 08:19 UTC
    since no single line contains the sentance, you don't need to be doing the test inside the while loop.

    @current_spurt always has only one word in it.

    just have another array called @my_sentance, and push @current_spurt on it. after the while loop where you read and process the file, do your if /\bso\b test.

    I modified:

    print STDOUT $string if $yeah; print $string; # into print STDOUT $string if $yeah; print $string, "|$.|"; # $. is the current line input number, see perlvar for more # and got bash-2.05$ perl test.org.pl ./data/ out |2||3||4|<s> so <s> so |5|we |6|can |7|do |8|it |9|again |10|yeah |11| +yeah </s> |12||13|bash-2.05$ bash-2.05$

     
    ___crazyinsomniac_______________________________________
    Disclaimer: Don't blame. It came from inside the void

    perl -e "$q=$_;map({chr unpack qq;H*;,$_}split(q;;,q*H*));print;$q/$q;"

Re: Re: Why doesn't the whole line print?
by lemming (Priest) on Aug 18, 2001 at 09:11 UTC

    As crazyinsomniac says, move the test out of the while loop becuase you don't have the full sentence. I went ahead and made some more changes. Got rid of some variables that were not needed, some stylistic changes as well, and two uses of grep. More can be done, but that's the once over for me

    #!/usr/bin/perl -w # # Create clusters of meeting data in a semi-unsupervised fashion # # parameters: # 1) input directory # 2) output data #################################################################### use strict; my $indir = shift @ARGV or die "No indir supplied"; my $outfile = shift @ARGV or die "No outfile supplied"; die "$indir not a directory!" unless -d $indir; open(OUTPUT, ">$outfile") || die "Could not open $outfile"; die "Could not chdir $indir" unless chdir($indir); opendir(INDIR, ".") || die "directory open of $indir failed"; # I prefer the chdir instead of using the slash for reading the filena +mes my @files = grep(!/^..?\z/, readdir(INDIR)); #get all the files in the + directory # I'm not sure if readdir will always give you . and .. first # See the readdir function in the docs. That's my prefered method from + the camel closedir(INDIR); #just being neat foreach my $infile (@files) { #process files until none are left next if -d $infile; open(INPUT, $infile) || die "error on file opening $infile"; my $within_spurt=0; my $previous_interupt = 0; my @current_spurt; <INPUT>; # skip first line which isn't data while (<INPUT>) { #inner loop to create initail clustering my (undef, undef, $current_word, $word_in_spurt, $spurt_length, undef, undef, undef, undef, $primary_speaker, undef, $interupting_speakers) = split(' '); # see if number of speakers increases and first_speaker == 0, # this means the spurt is interupting, not the primary speaker if ($primary_speaker == 0 && $interupting_speakers > $previous_interupt && $word_in_spurt == 1) { push @current_spurt, "<s> $current_word ";# start marker $within_spurt = 1; if($word_in_spurt == $spurt_length) {#only word in spurt push @current_spurt, "<\/s>\n"; $within_spurt = 0; $interupting_speakers = 0; # the interupt is over } } # if we are in a spurt and it is not the last word of that spurt elsif($within_spurt && $word_in_spurt != $spurt_length) { push @current_spurt, "$current_word "; #add current word } # if this is the last word of a spurt elsif($word_in_spurt == $spurt_length && $within_spurt == 1) { push @current_spurt, "$current_word <\/s>\n"; #end marker $within_spurt = 0; $interupting_speakers = 0; # the interupt is over } #make sure that at the end of spurts this flag is reduced if($word_in_spurt == $spurt_length) { $interupting_speakers--; # should not be less than 0 if($interupting_speakers < 0) { $interupting_speakers = 0; } } $previous_interupt = $interupting_speakers; } close(INPUT); if( grep { /\bso\b/ } @current_spurt) {; print STDOUT @current_spurt; print OUTPUT @current_spurt; } }
Re: Re: Why doesn't the whole line print?
by hillard (Acolyte) on Aug 18, 2001 at 05:53 UTC
    Here is the part of the data file that will give the data needed, I can't technically give out the file, it is not allowed to be public yet, crazy legal stuff..
    mr001 c0 right 1 1 40.352 0.27 r:15_ay:9_t:3 2.26 0 0 1 1 0 0 0 0 1 1 +0 0 1 0 0 0 0 0 0 0 0 0 mr001 c0 so 1 3 76.197 0.21 s:13_ow:8 35.575 0 0 0 0 1 0 0 0 1 1 0 0 0 + 1 0 0 0 1 0 0 0 0 mr001 c0 go 2 3 76.407 0.1 g:3_ow:7 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 +0 0 1 0 0 0 mr001 c0 ahead 3 3 76.507 0.12 ax:3_hh:3_eh:3_d:3 0 0 0 1 0 0 0 0 0 0 +0 0 0 0 0 0 0 0 0 0 0 0 0 mr001 c0 so 1 8 80.301 0.06 s:3_ow:3 3.674 0 0 1 0 1 0 0 0 1 1 0 0 0 1 + 0 0 0 0 0 0 0 0 mr001 c0 we 2 8 80.361 0.06 w:3_iy:3 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 + 0 0 1 0 0 0 mr001 c0 can 3 8 80.421 0.25 k:18_ax:4_n:3 0 0 0 1 0 0 0 0 0 0 0 0 0 0 + 0 0 0 0 0 0 0 0 0 mr001 c0 do 4 8 80.671 0.06 d:3_uw:3 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 + 0 0 0 0 0 0 mr001 c0 it 5 8 80.731 0.13 ax:10_t:3 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 +0 0 0 0 0 0 0 mr001 c0 again 6 8 80.861 0.22 ax:7_g:4_eh:3_n:8 0 0 1 1 0 0 0 0 0 0 0 + 0 0 0 0 0 0 0 0 0 0 0 0 mr001 c0 yeah 7 8 81.315 0.15 y:7_ae:8 0.234 0 0 1 1 0 0 0 0 0 0 0 0 1 + 0 0 0 0 0 0 0 0 0 mr001 c0 yeah 8 8 81.465 0.29 y:4_ae:25 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 +0 0 0 1 0 0 0 0 mr001 c0 i'm 1 4 100.044 0.16 ay:11_m:5 18.289 0 0 0 0 0 0 0 0 1 1 0 0 + 0 0 0 0 0 1 0 0 0 0