use strict; use warnings; use Data::Dumper; # this is how far forward or back you need to read my $width = 200; # this is your target string. You can make it a regex if you prefer my $target = 'search'; # file to search my $file = 'test.txt'; my $fsize = -s $file; # when you're done, this should contain the data you're looking for my @chunks; open FILE, "< $file" or die "Cannot open $file for reading: $!"; while (<FILE>) { if ( /$target/g ) { my $file_position = tell FILE; # backwards from end of string my $word_position = $file_position - (length( $_ ) - pos( $_ ) +); # to beginning of word. It's separate so you can # pull it out if necessary. $word_position -= length $target; push @chunks, get_chunk( \*FILE, $word_position, $file_positio +n, $width, $fsize ); } } print Dumper \@chunks; close FILE; sub get_chunk { my ( $fh, $word_position, $file_position, $width, $fsize ) = @_; # don't try to read before beginning of file my $start = $word_position >= $width ? $word_position - $width : 0; # don't try to read after end of file my $end = $word_position + $width <= $fsize ? $word_position + $width : $fsize; # position to start of where we want to read seek $fh, $start, 0; my $chunk; # shouldn't fail unless I got my boundaries wrong read ( $fh, $chunk, $end-$start ) or die "Problem reading file: $! +"; # put us back to where we were seek $fh, $file_position, 0; return $chunk; }

In reply to Searching for 'chunks' of data in very large files by Ovid

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.