I tried using your code with a couple of changes, primarily substituting in the regex I had for the my sentences (the data that I'm using has a ton of punctuation, such as ... and "" and so on, so the regex you had in your example wasn't what I needed and was cutting out a lot of the data). However, when I put my regex in, it went back to the same issue I had before, which was that it would print out the first paragraph with the sentence brackets around the sentences but then print out the same paragraph again but with the paragraph brackets around the whole paragraph...

Just so you can have an idea of what I changed, I've included the code below, with an excerpt of the text file I'm using.

local $/ = ""; open $fh, $ARGV[0] or die "File $ARGV[0] not found!\n"; $scount = 0; $pcount=0; @paragraphs; while ($paragraph = <$fh>){ @sentences; while($paragraph =~ /\s*(([A-Z][A-Za-z]*)(((([A-Za-z]|[0-9])*((\'* +|\-*)[A-Za-z]*))\s*(\.{3})*\!*\"*\(*\)*\,*\:*\s*)*(([A-Za-z]|[0-9])*) +)(\.|\?|\!))/g){ push @sentences, "<s>$1</s>"; $scount++; } push @paragraphs, "<p>\n\t" . join("\n\t", @sentences) . "\n</p>\n +"; $pcount++; } print for @paragraphs; print "\n Total Lines: $scount\n"; print "\n Total Paragraphs: $pcount\n";

Data:

But the truth is that in the short run, markets can occasionally be pushed, especially when so many decisions to buy or sell are keyed off what everyone else in the market is doing. Chain reactions are not much harder to start (in fact, given how quickly price moves get noticed, they may be easier) than they were 70 years ago.

All that notwithstanding, the interesting thing about the Greenspan resignation rumor was that it raised an obvious question: Would it really matter? As Jacob Weisberg just pointed out in " Ballot Box," Steve Forbes is apparently the only American who doesn't think Greenspan has done a terrific job as Fed chairman. And most of us would be happy to have Greenspan stay in office even after his current term expires in the middle of next year. But it's interesting to note that in the past couple of months there have been more than a few voices--including those of economists Greg Mankiw and Robert Barr--suggesting that Greenspan has been more the beneficiary of good economic fundamentals than the creator of them.

That position may be a bit overstated, particularly since Greenspan has shown an unusual ability to let his thinking on inflation, productivity, and the economy's possible growth rate evolve in response to changing data. But the essential point, that the soundness of this economy does not depend on Greenspan's presence at the head of the Fed, is right. That might not be the case if Greenspan's successor were either an inflation dove like William Greider or a perma-bear like Jim Grant. But whoever would succeed Greenspan would be nothing of the sort. He or she would be, in a word, Greenspanian, still concerned about the possibility of an overheating economy but also convinced that important technological changes have allowed this economy to grow faster than in the past without sparking inflation.

If anything, in fact, the bond market should have rallied on news that Greenspan might be stepping down, since he has long since stopped being paranoid enough for bondholders, who seem perpetually convinced that the United States is about to become Brazil. There are certainly Fed governors out there who would be far more likely to raise interest rates aggressively at the first hint of price pressures than Greenspan.


In reply to Re^4: How to match regex over multiline file by kyaloupe
in thread How to match regex over multiline file by kyaloupe

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.