Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a huge file with the following lines

# run1
# word 26871
# text returns
text
text
text
# run2
# word 26872
# text returns
text
text
#run3

I would like to be able to read the file and get the following output
word 26871
number of text = 3
word 26872
number of text = 2

the program I wrote is posted below and sort of works but gives:

word 26871
number of text = 1
number of text = 2
number of text = 3
word 26872
number of text = 1
number of text = 2

I have some solutions to the above problem but they are very
inefficient some of which include - passing the counts to an array
and accessing the last element of the array that being the total
number of text counts for word. Then clearing the array and
repeating. This seems like a terribly inefficient way to solve this
problem.
I know that I should move the print "the number of text ="
outside the loop. But some of the options I have tried don't exactly
pan out.
Any advice would be much appreciated.
#!/usr/bin/perl -w print "Type in the name of the file you want to read: ", "\n" ; $count = 0 ; $filename =<STDIN> ; # reads in a file that I type in. chomp $filename ; # removes the newline after the filename open(FILE, $filename) || die "I can't read your file, it does not exi +st" ; while (defined ($line = <FILE>)) { if ($line =~ /word/) ## { $name = substr($word, 2, 11) ; print "the word number is: ", $name, "\n" ; + $count = 0 } elsif ($line !~ /#/) { $count++ ; } print "the number of text = ", $count, "\n" ; } close FILE ;

Replies are listed 'Best First'.
Re: Counting between the lines
by bobf (Monsignor) on Dec 14, 2004 at 07:38 UTC

    I'd prefer using a hash, but since you didn't say how big your input files could be and apparently don't need to store the data for any further use, your approach of processing it line by line is just fine. What you have is close - just move a few of the variables around and that should about do it. Here is one way (it's not as elegant as a hash, but it is more like your original code)(tested):

    # use the actual error for open open(FILE, $filename) || die "Error opening $filename: $!"; my ( $word, $count ); # move these out of the loop while (defined ($line = <FILE>)) { if ($line =~ /^# word (\d+)/) # tighten up regex a bit { if( defined $word ) { # print the results for the previous block print "the word number is: $word\n"; print "number of text = $count\n"; } $word = $1; $count = 0; } elsif ($line !~ /#/) { $count++; } } # print the results for the last block of data before eof print "the word number is: $word\n"; print "number of text = $count\n"; ** output from your example data ** the word number is: 26871 number of text = 3 the word number is: 26872 number of text = 2

    Good job using warnings, but you should also use strict.

    Update: I see gaal beat me to the punch. I guess that's what I get for testing... ;)

Re: Counting between the lines
by gaal (Parson) on Dec 14, 2004 at 07:32 UTC
    This is not tested:

    my ($count, $word); while (<>) { $count++ unless /#/; # or maybe /^#/ would be better? if (/^# word (\d+)/) { # or \S+, etc. -- depending on what's +a valid word print "word $word\nnumber of text = $count\n" if defined $word +; $count = 0; $word = $1; } } print "word $word\nnumber of text = $count\n" if defined $word; # h +andle leftover word.

    You can move the conditional output print to a function if duplicating the code bothers you.

Re: Counting between the lines
by wfsp (Abbot) on Dec 14, 2004 at 07:24 UTC
    A hash might be one way.
    #!/usr/bin/perl use warnings; use strict; my %hash; my $name; while (<DATA>){ if (/word\s(\d+)/){ $name = $1; } elsif (! /#/){ $hash{$name}++; } } for my $key (keys %hash){ print "$key $hash{$key}\n"; } __DATA__ # run1 # word 26871 # text returns text text text # run2 # word 26872 # text returns text text #run3
    Output:
    26871 3 26872 2
      As long as the "current word" doesn't return to being one for which we'd already starting counting lines, there's no need to keep the data in memory. Emit it as you gather enough of it.
Re: Counting between the lines
by ysth (Canon) on Dec 14, 2004 at 10:19 UTC
    Nit: the solutions that print a final line after the while loop will fail if there are no words.