(sigh)
There can be cases in your data where a single line contains multiple matches (that is, "it" followed by stuff followed by end-of-sentence punctuation could occur more than once on one line) -- this would certainly be true if your text file contains no line-breaks anywhere in the middle of the long text string.
That's why the most recent code I suggested in a previous reply went like this:
while (<>) {
while ( /\bit (.*?)[.?!]/ig ) {
print "\n$1\n";
}
}
(apologies to others for repeating that; but it seems like everyone else has already abandoned this thread anyway)
Note the second "while" loop, and the "g" flag on the regex match. This is a way of looking for the same pattern repeatedly in a single string value, and performing the same operations (inside the loop) on every match. Also note the "?" qualifier that follows the ".*" inside the parens -- this makes the match "non-greedy", which is very important here.
I actually tested this myself, using your sample text, and the sort of command line that you reported using, and it most certainly does work -- it produced the following output:
manually till April 1996
was happily feeding modules through to the CPAN archive sites
made sense for the module listing part of the Module List to be built
+from that database
If it doesn't work for you this time, then you have to start looking at some non-perl issues, like:
Are you running this inside a command-line shell window? Because if you're on a ms-windows system, and you type that command line into the "Run..." type-in box from the "Start" menu, then you probably won't get to see anything -- you have to start up a "MS-DOS Prompt" window, and run that command at the DOS prompt in that window. |