in reply to Extracting blocks of text
You can use $/ (see perlvar) and set it to a string to control what the diamond operator see's as a line ending. By setting this to 'head' and then 'tail' alternately, you can move through you large file in chunks, discarding the 1st, 3rd, 5th and printing the 2nd, 4th & 6th etc.
#! perl -slw use strict; open IN, '<', $ARGV[ 0 ] or die $!; $/ = 'head'; while( <IN> ) { local $/ = 'tail'; print scalar <IN>; } close IN; __END__ P:\test>type junk.txt The quick brown fox jumps over the lazy dog 0001 head The quick brown fox jumps over the lazy dog 0002 The quick brown fox jumps over the lazy dog 0003 The quick brown fox jumps over the lazy dog 0004 The quick brown fox jumps over the lazy dog 0005 tail The quick brown fox jumps over the lazy dog 0006 The quick brown fox jumps over the lazy dog 0007 The quick brown fox jumps over the lazy dog 0008 headThe quick brown fox jumps over the lazy dog 0009 The quick brown fox jumps over the lazy dog 0010 tail The quick brown fox jumps over the lazy dog 0011 The quick brown fox jumps over the lazy dog 0012 P:\test>235232 junk.txt The quick brown fox jumps over the lazy dog 0002 The quick brown fox jumps over the lazy dog 0003 The quick brown fox jumps over the lazy dog 0004 The quick brown fox jumps over the lazy dog 0005 tail The quick brown fox jumps over the lazy dog 0009 The quick brown fox jumps over the lazy dog 0010 tail
The caveat is that if the chunks you are discarding (between 'tail' and then next 'head' marker) are very large, they will consume large amounts of memory.
As implemented above, the 'head' marker is discarded, but the 'tail' marker is printed. Add or delete as neccessary.
This also assumes that by "including the lines the key words are on.", you do not mean that you want any text preceding the 'head' marker, if the head marker is in the middle of a line, nor anything after the 'tail' marker if it can appear in the middle of a line.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: Extracting blocks of text
by adenardo (Initiate) on Jun 28, 2006 at 20:23 UTC | |
by BrowserUk (Patriarch) on Jun 28, 2006 at 21:37 UTC |