I am trying to isolate information between two patterns to both capture the information between (here, the case name) and count the instances a particular word (here, the word Education but I would also like to count the number of words in general).

From your description, I didn't really understand what you tried to accomplish (this may be a language problem for me). I understood - you want to count different things from different paragraphs. Im not sure what to use here, maybe sth. like a parser or just simple counting. I tried to put this (what I understood) in a small example below:

... my ($casename, $count, $wordcount) = ('', 0, 0); $/ = ''; # paragraph mode open my $fh, '<', 'data.txt' or die "can't open, $!"; while( <$fh> ) { REPEAT: { # Play parser /\G^\d+\s+of\s+\d+\s+DOCUMENTS$/gc && do { redo; }; # enter acti +on here /\G^No\.\s+(\w+)$/gc && do { $casename = $1; redo; }; } # simply count stuff $count += () = /EDUCATION/g; $wordcount += () = /\w+/g } close $fh; print "Case Name: $casename\n"; print "Education: $count\n"; print "Words: $wordcount\n"; ...

The /gc regex parser thing is somehow explained in Perlfaq6.

Regards

mwa


In reply to Re: Capture/Counting text between patterns across multiple lines by mwah
in thread Capture/Counting text between patterns across multiple lines by micwood

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.