(sigh)

There can be cases in your data where a single line contains multiple matches (that is, "it" followed by stuff followed by end-of-sentence punctuation could occur more than once on one line) -- this would certainly be true if your text file contains no line-breaks anywhere in the middle of the long text string.

That's why the most recent code I suggested in a previous reply went like this:

while (<>) { while ( /\bit (.*?)[.?!]/ig ) { print "\n$1\n"; } }

(apologies to others for repeating that; but it seems like everyone else has already abandoned this thread anyway)

Note the second "while" loop, and the "g" flag on the regex match. This is a way of looking for the same pattern repeatedly in a single string value, and performing the same operations (inside the loop) on every match. Also note the "?" qualifier that follows the ".*" inside the parens -- this makes the match "non-greedy", which is very important here.

I actually tested this myself, using your sample text, and the sort of command line that you reported using, and it most certainly does work -- it produced the following output:

manually till April 1996 was happily feeding modules through to the CPAN archive sites made sense for the module listing part of the Module List to be built +from that database

If it doesn't work for you this time, then you have to start looking at some non-perl issues, like:

Are you running this inside a command-line shell window? Because if you're on a ms-windows system, and you type that command line into the "Run..." type-in box from the "Start" menu, then you probably won't get to see anything -- you have to start up a "MS-DOS Prompt" window, and run that command at the DOS prompt in that window.


In reply to Re^5: Help please by graff
in thread Help please by Stud_Perl

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.