in reply to Capture/Counting text between patterns across multiple lines
I am trying to isolate information between two patterns to both capture the information between (here, the case name) and count the instances a particular word (here, the word Education but I would also like to count the number of words in general).
From your description, I didn't really understand what you tried to accomplish (this may be a language problem for me). I understood - you want to count different things from different paragraphs. Im not sure what to use here, maybe sth. like a parser or just simple counting. I tried to put this (what I understood) in a small example below:
... my ($casename, $count, $wordcount) = ('', 0, 0); $/ = ''; # paragraph mode open my $fh, '<', 'data.txt' or die "can't open, $!"; while( <$fh> ) { REPEAT: { # Play parser /\G^\d+\s+of\s+\d+\s+DOCUMENTS$/gc && do { redo; }; # enter acti +on here /\G^No\.\s+(\w+)$/gc && do { $casename = $1; redo; }; } # simply count stuff $count += () = /EDUCATION/g; $wordcount += () = /\w+/g } close $fh; print "Case Name: $casename\n"; print "Education: $count\n"; print "Words: $wordcount\n"; ...
The /gc regex parser thing is somehow explained in Perlfaq6.
Regards
mwa
|
|---|