in reply to Grab 3 lines before and 2 after each regex hit

This is a fairly primitive way to do it:

#!/usr/bin/perl -w use strict; my @lines = <DATA>; for(1..$#lines) { if($lines[$_]=~m/[^\d]+\d+/){ print qq~ $lines[$_-3] $lines[$_-2] $lines[$_-1] $lines[$_] $lines[$_+1] $lines[$_+2] ~; } } 1; __END__ alpha beta something a07607 b-alpha b-beta b-something b-something else c-alpha c-beta c-somethin a9706 d-alpha d-beta d-something d-something else

produces...

alpha beta something a07607 b-alpha b-beta c-alpha c-beta c-somethin a9706 d-alpha d-beta

Celebrate Intellectual Diversity

Replies are listed 'Best First'.
Re^2: Grab 3 lines before and 2 after each regex hit (sliding window)
by LanX (Saint) on Apr 25, 2014 at 20:31 UTC
    > This is a fairly primitive way to do it:

    using a sliding window (safer with huge streams)

    use strict; use warnings; use Data::Dump; my @window; push @window, scalar <DATA> for 1..5; # init while (my $line = <DATA>) { push @window, $line; chomp @window; if( $window[3] =~ m/[^\d]+\d+/ ){ dd \@window; } shift @window; } __END__ alpha beta something a07607 b-alpha b-beta b-something b-something else c-alpha c-beta c-somethin a9706 d-alpha d-beta d-something d-something else
    -->
    ["alpha", "beta", "something", "a07607", "b-alpha", "b-beta"] ["c-alpha", "c-beta", "c-somethin", "a9706", "d-alpha", "d-beta"]

    Cheers Rolf

    ( addicted to the Perl Programming Language)

    update

    maybe more elegant

    use strict; use warnings; use Data::Dump; my @window; while (my $line = <DATA>) { push @window, $line; next if @window < 6; # init if( $window[3] =~ m/[^\d]+\d+/ ){ dd \@window; } shift @window; }

    Update

    Oh the latter (more elegant) approach has a clear advantage, if you want to avoid overlapping results you just need to empty the window after a match and it gets automatically refilled. :)

      The sliding window sounds like another great suggestion

      Thank you.

Re^2: Grab 3 lines before and 2 after each regex hit
by HarryPutnam (Novice) on Apr 25, 2014 at 19:32 UTC

    Your techinque answers the need nicely...
    thank you

    for(1..$#lines)
    {
        if($lines$_=~m/^\d+\d+/){
             print qq~
    ....... ...         
             ~;
    

    I guess that `pp~' operates something like a here document?
    Can you explain a bit?

        http://perldoc.perl.org/perlop.html#Quote-Like-Operators

        Egad... I understood about 1/10 of a percent of that.

        I see the authors have made a fairly extensive effort to make the explanations readable... but still seems aimed at an audience several good steps above me. Or possibly just a few layers less of `thickskulledness'.

        Finally resorted to skipping thru and reading every place qq appears. However, I came away mostly with my poor pea brain swimming.

        I never really recognized your usage in those pages.

        One small thing that did stay with me:
        qq means what follows is interpolated... beyond that and even that itself, sails right over my head.

Re^2: Grab 3 lines before and 2 after each regex hit
by HarryPutnam (Novice) on Apr 25, 2014 at 21:00 UTC
    Can we go a little deeper into the intended usage of the techniques mentioned in this thread?

    I haven't understood everything that has been presented but enough to use some of the infomation posted and complete a working script for my purpose soon.

    There was some talk of slurping sections or even whole filesfiles:
    On that topic; let me explain very briefly what the intended usage is. The code will be used to search and extract thru some fairly massive piles of files at times

    Once File::Find is added into the script it will likely be expected to recurse thru usenet style hierarchies (hierarchies of my own creation, so smaller than real ones) that might consist of as many as 45000-55000 messages in total (not per group)

    So, with that scale of usage in mind would slurping of whole files still be a wise way to go? Or would that be so labor intensive as to make it worth while to do it a different way?

Re^2: Grab 3 lines before and 2 after each regex hit
by locked_user sundialsvc4 (Abbot) on Apr 24, 2014 at 18:25 UTC
    Marvelously elegant, if the file-size is not too big ... as these days it is unlikely to be. ++
      This is the usual sycophantic crap you post after you've been called out on a series of junk posts filled with lies and bad advice. Anyone reading your post history will be familiar with this pattern.