Re^2: Extracting blocks of text

this has been an educating discussion...how about a twist? I am looking to parse a large file, and extract blocks of text that begin with the word term. I can't always anticipate how the block will end, other than by stating that whenever the word term appears, a new block begins. is there a way to create an array where each element is a text block that begins with the word term, and that element ends immediately before the next occurance of the word term?

example file:

term {
yada yada 
12345
() ...
}

term only occurs here {
could be 30 lines here
but never that word again until
another block starts
yadada
}

term, etc.

_END_
[download]

so, this file would hopefully result in an array with 3 elements. another challenge, is that the last text block will not have the word term at the end of it. thanks in advance :-) ad3

Comment on Re^2: Extracting blocks of text Download Code

Replies are listed 'Best First'.
Re^3: Extracting blocks of text by BrowserUk (Patriarch) on Jun 28, 2006 at 21:37 UTC
Assuming the file is small enough to slurp, then split does the job nicely: `#! perl -slw use strict; my @array = split 'term', do{ local $/; <DATA> }; shift @array; ## Discard leading null print '---', "\n", $_, "\n" for @array; __DATA__ term { yada yada 12345 () ... } term only occurs here { could be 30 lines here but never that word again until another block starts yadada } term, etc.` [download] That discards the term itself. If you want to retain the term in each element, then perhaps the simplest way is to just put it back after the split. Just substitute this line into the above. `my @array = map{ "term$_" } split 'term', do{ local $/; <DATA> };` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^3: Extracting blocks of text
by BrowserUk (Patriarch) on Jun 28, 2006 at 21:37 UTC

Assuming the file is small enough to slurp, then split does the job nicely:

#! perl -slw
use strict;

my @array = split 'term', do{ local $/; <DATA> };
shift @array; ## Discard leading null

print '---', "\n", $_, "\n" for @array;

__DATA__
term {
yada yada
12345
() ...
}

term only occurs here {
could be 30 lines here
but never that word again until
another block starts
yadada
}

term, etc.
[download]

That discards the term itself. If you want to retain the term in each element, then perhaps the simplest way is to just put it back after the split. Just substitute this line into the above.

my @array = map{ "term$_" } split 'term', do{ local $/; <DATA> };
[download]

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

[reply]
[d/l]
[select]