walker has asked for the wisdom of the Perl Monks concerning the following question:
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Extracting blocks of text
by Rhose (Priest) on Jan 30, 2004 at 14:38 UTC | |
Output
Update | [reply] [d/l] [select] |
Re: Extracting blocks of text
by BrowserUk (Patriarch) on Jan 30, 2004 at 15:01 UTC | |
You can use $/ (see perlvar) and set it to a string to control what the diamond operator see's as a line ending. By setting this to 'head' and then 'tail' alternately, you can move through you large file in chunks, discarding the 1st, 3rd, 5th and printing the 2nd, 4th & 6th etc.
The caveat is that if the chunks you are discarding (between 'tail' and then next 'head' marker) are very large, they will consume large amounts of memory. As implemented above, the 'head' marker is discarded, but the 'tail' marker is printed. Add or delete as neccessary. This also assumes that by "including the lines the key words are on.", you do not mean that you want any text preceding the 'head' marker, if the head marker is in the middle of a line, nor anything after the 'tail' marker if it can appear in the middle of a line. Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham"Think for yourself!" - Abigail Timing (and a little luck) are everything! | [reply] [d/l] |
by adenardo (Initiate) on Jun 28, 2006 at 20:23 UTC | |
so, this file would hopefully result in an array with 3 elements. another challenge, is that the last text block will not have the word term at the end of it. thanks in advance :-) ad3 | [reply] [d/l] |
by BrowserUk (Patriarch) on Jun 28, 2006 at 21:37 UTC | |
Assuming the file is small enough to slurp, then split does the job nicely:
That discards the term itself. If you want to retain the term in each element, then perhaps the simplest way is to just put it back after the split. Just substitute this line into the above.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
Re: Extracting blocks of text
by pelagic (Priest) on Jan 30, 2004 at 14:17 UTC | |
I run it with file and it showed it does not work properly if after a tail there is a head on the same line ... pelagic | [reply] [d/l] [select] |
by walker (Initiate) on Feb 01, 2004 at 03:40 UTC | |
| [reply] |
by graff (Chancellor) on Feb 02, 2004 at 04:45 UTC | |
Why didn't you say so in the first place? That would change how people answer the question. and I don't understand why are there's 2 tests for tail and 2 print commands ? Well, actually, there's no need for the duplication. The following would work just as well -- and would cover your little "amendment" to the original spec: Note that if there is a new "head" line within the five lines that follow a "tail", the $withinblock state variable gets reset to 6, and will stay there till the next "tail". If there is no "head" within the next five lines, it will decrement to 0, turning off the output. Another "feature" of this version is that if there is a "tail" line without a previous "head", the five lines following "tail" will still get printed. One more thing: since the head and tail regexes are not anchored, the logic will fire whenever these words happen to show up in the data -- e.g:
| [reply] [d/l] [select] |
by walker (Initiate) on Feb 02, 2004 at 14:17 UTC | |
Re: Extracting blocks of text
by mr_mischief (Monsignor) on Jan 30, 2004 at 14:41 UTC | |
Sorry if I misunderstood your question, but according to the way I read it I think this is close. Given this file:
I get this output:
Sometimes a simple procedural style works really well, even if you have bells and whistles available. This could be written the same in almost any language. Perl just makes it easier. Christopher E. Stith | [reply] [d/l] [select] |