Reading the file in large chunks is more efficient than going line by line, but then you have to worry about your chunks breaking the target word into more than one piece, which means you have to find the last word piece of each chunk and add it to the beginning of the next chunk, which lowers efficiency. When and if you do find the word, you have to file seek to a place x amount before it, read in the word plus the buffer zone around it, and extract the other 10 words, which means your script is also slow if there are numerous matches. I got about 106 seconds using the following on a 712 MB file containing the target word at the very end:
use strict; use warnings; my $fname = 'test.dat'; my $csize = 102400; my $word = 'bingo'; my ($handle, $length, $end, $pos, $c); open($handle, $fname); while () { $length = read($handle, $_, $csize); if (index($end.$_, $word) != -1) { $pos = tell($handle) - $length + index($end.$_, $word) - 100; $pos = 0 if $pos < 0; seek($handle, $pos, 0); read($handle, $_, 220); for (@_ = split(/\W+/)) { if ($_ eq $word) { print join(' ', @_[(($c > 4) ? $c - 5 : 0)..(($c < $#_ + - 3) ? $c + 5 : $#_)]); last; } $c++; } last; } last if (!$length); ($end) = m/[\W](\w*$)/; } close($handle);
Decidedly inefficient, and I haven't provided for multiple matches yet either. It would probably be much better to index the file for all words occuring less than x number of times, and/or use a system utility to find the locations of matches. Even 30 seconds is unacceptable, and under 5 would be a lot better.

In reply to Re: searching for a keyword with context window by TedPride
in thread searching for a keyword with context window by fadingjava

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.