You can use $/ (see perlvar) and set it to a string to control what the diamond operator see's as a line ending. By setting this to 'head' and then 'tail' alternately, you can move through you large file in chunks, discarding the 1st, 3rd, 5th and printing the 2nd, 4th & 6th etc.

#! perl -slw use strict; open IN, '<', $ARGV[ 0 ] or die $!; $/ = 'head'; while( <IN> ) { local $/ = 'tail'; print scalar <IN>; } close IN; __END__ P:\test>type junk.txt The quick brown fox jumps over the lazy dog 0001 head The quick brown fox jumps over the lazy dog 0002 The quick brown fox jumps over the lazy dog 0003 The quick brown fox jumps over the lazy dog 0004 The quick brown fox jumps over the lazy dog 0005 tail The quick brown fox jumps over the lazy dog 0006 The quick brown fox jumps over the lazy dog 0007 The quick brown fox jumps over the lazy dog 0008 headThe quick brown fox jumps over the lazy dog 0009 The quick brown fox jumps over the lazy dog 0010 tail The quick brown fox jumps over the lazy dog 0011 The quick brown fox jumps over the lazy dog 0012 P:\test>235232 junk.txt The quick brown fox jumps over the lazy dog 0002 The quick brown fox jumps over the lazy dog 0003 The quick brown fox jumps over the lazy dog 0004 The quick brown fox jumps over the lazy dog 0005 tail The quick brown fox jumps over the lazy dog 0009 The quick brown fox jumps over the lazy dog 0010 tail

The caveat is that if the chunks you are discarding (between 'tail' and then next 'head' marker) are very large, they will consume large amounts of memory.

As implemented above, the 'head' marker is discarded, but the 'tail' marker is printed. Add or delete as neccessary.

This also assumes that by "including the lines the key words are on.", you do not mean that you want any text preceding the 'head' marker, if the head marker is in the middle of a line, nor anything after the 'tail' marker if it can appear in the middle of a line.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
Timing (and a little luck) are everything!


In reply to Re: Extracting blocks of text by BrowserUk
in thread Extracting blocks of text by walker

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.