in reply to How to match regex over multiline file
and try combining with "paragraph mode"
local $/ = ""; ## paragraph mode
If you can make any kind of progress with these nodes, I'll help you fill in the blanks
|
|---|
| Replies are listed 'Best First'. | |||
|---|---|---|---|
|
Re^2: How to match regex over multiline file
by kyaloupe (Initiate) on Oct 10, 2013 at 02:28 UTC | |||
Alright, I was able to fix my regex and it's working exactly as I want it to! Thank you! I was wondering if I could ask another question, though. So now that I have my regex matching over multiple lines, I wanted to take the raw textfile and have the output be the entire paragraph bracketed in paragraph tags and the individual sentences inside with sentence tags. I was able to write the code to do both separately, with the necessary regex, but I need to write it so they're nested within each other. Here's the code I have so far:
When I run both sections at the same time, first it will print out each paragraph section with the sentence tags around each sentence, then it prints the same paragraph but with the paragraph tags. How do I fix it? | [reply] [d/l] | ||
by Athanasius (Archbishop) on Oct 10, 2013 at 08:01 UTC | |||
Hello kyaloupe, and welcome to the Monastery! Since you’re reading the text in paragraph mode, I don’t see why you need any regex to identify paragraphs? Also, unless your data (not shown) is special, I don’t see why you need such a complicated regex to identify sentences? In any case, here is how I would tackle this problem:
Output:
As you can see, I identify sentences as each paragraph is read in, and then wrap what is found in the appropriate tags. See join. (I’ve added tabs just to make the structure of the markup easier to see when it’s printed out.) Hope that helps,
| [reply] [d/l] [select] | ||
by kyaloupe (Initiate) on Oct 11, 2013 at 23:27 UTC | |||
I tried using your code with a couple of changes, primarily substituting in the regex I had for the my sentences (the data that I'm using has a ton of punctuation, such as ... and "" and so on, so the regex you had in your example wasn't what I needed and was cutting out a lot of the data). However, when I put my regex in, it went back to the same issue I had before, which was that it would print out the first paragraph with the sentence brackets around the sentences but then print out the same paragraph again but with the paragraph brackets around the whole paragraph... Just so you can have an idea of what I changed, I've included the code below, with an excerpt of the text file I'm using.
Data: But the truth is that in the short run, markets can occasionally be pushed, especially when so many decisions to buy or sell are keyed off what everyone else in the market is doing. Chain reactions are not much harder to start (in fact, given how quickly price moves get noticed, they may be easier) than they were 70 years ago. All that notwithstanding, the interesting thing about the Greenspan resignation rumor was that it raised an obvious question: Would it really matter? As Jacob Weisberg just pointed out in " Ballot Box," Steve Forbes is apparently the only American who doesn't think Greenspan has done a terrific job as Fed chairman. And most of us would be happy to have Greenspan stay in office even after his current term expires in the middle of next year. But it's interesting to note that in the past couple of months there have been more than a few voices--including those of economists Greg Mankiw and Robert Barr--suggesting that Greenspan has been more the beneficiary of good economic fundamentals than the creator of them. That position may be a bit overstated, particularly since Greenspan has shown an unusual ability to let his thinking on inflation, productivity, and the economy's possible growth rate evolve in response to changing data. But the essential point, that the soundness of this economy does not depend on Greenspan's presence at the head of the Fed, is right. That might not be the case if Greenspan's successor were either an inflation dove like William Greider or a perma-bear like Jim Grant. But whoever would succeed Greenspan would be nothing of the sort. He or she would be, in a word, Greenspanian, still concerned about the possibility of an overheating economy but also convinced that important technological changes have allowed this economy to grow faster than in the past without sparking inflation. If anything, in fact, the bond market should have rallied on news that Greenspan might be stepping down, since he has long since stopped being paranoid enough for bondholders, who seem perpetually convinced that the United States is about to become Brazil. There are certainly Fed governors out there who would be far more likely to raise interest rates aggressively at the first hint of price pressures than Greenspan. | [reply] [d/l] | ||
by aaron_baugher (Curate) on Oct 12, 2013 at 12:49 UTC | |||
by Athanasius (Archbishop) on Oct 13, 2013 at 03:01 UTC | |||
|
Re^2: How to match regex over multiline file
by kyaloupe (Initiate) on Oct 10, 2013 at 01:57 UTC | |||
Alright, using the paragraph mode definitely did something, it's now giving me entire paragraphs from the textfile as the output, which works for one part of my code, but not quite all of it. I'm going to try editing my regex (maybe it's just wayyyy too broad, which is why it's giving me the whole paragraph) to just match a single sentence. Thank you! | [reply] | ||