Extracting array elements on either side of a match

punkish has asked for the wisdom of the Perl Monks concerning the following question:

Apologies for quite possibly a highly unsuitable post Title. Please feel free to propose a better Title.

A poster on the SQLite list asked the following question: given the following data extracted from the db table

Field1,Field2,Field3
1.00,Blue,12:00    <-----
40.20,White,12:00
80.30,White,12:00
120.00,White,12:00
126.00,White,12:00 <-----
162.43,White,12:00 <-----
198.86,White,12:00 <-----
235.29,White,12:00 <-----
271.72,Red,03:45   <=====
308.15,White,12:00 <-----
344.58,White,12:00 <-----
381.01,White,12:00 <-----
417.44,White,12:00 <-----
453.87,White,12:00
490.30,White,12:00
526.73,White,12:00
563.16,Red,07:45
599.59,White,12:00
636.02,White,12:00
672.45,White,12:00
708.88,White,12:00
745.31,White,12:00
781.74,White,12:00
818.17,White,12:00
854.60,Blue,12:00  <-----
891.03,White,12:00
963.89,White,12:00
1000.32,White,12:00
1036.75,Red,08:30
1073.18,White,12:00
1109.61,Red,06:00
1146.04,White,12:00
1182.47,White,12:00
1218.90,White,12:00
18516.40,Blue,12:00
1255.33,White,12:00
927.46,White,12:00
[download]

The poster wanted to find a first occurrence of "Red," then searching from this record find the previous 4 records that contain "White," then find the Previous 1 that contains "Blue." Now do the same thing but searching "next". In other words, the following result was desired:

1.00,Blue,12:00
126.00,White,12:00
162.43,White,12:00
198.86,White,12:00
235.29,White,12:00
271.72,Red,03:45
308.15,White,12:00
344.58,White,12:00
381.01,White,12:00
417.44,White,12:00
854.60,Blue,12:00
[download]

I have marked the desired rows with <----- and <=====

A very complicated SQL was proposed, but this is the kind of thing best accomplished by in the application. I have been recuperating from a post-Thanksgiving flu/fever, so to pass away boredom, I decided to write the following solution in my favorite programming language.

my @res = analyze();
for (@res) {
    print $_->{f1} . ',' . $_->{f2} . ',' . $_->{f3} . "\n";
}

sub analyze {
    
    # get the data as an array of hashes just like DBI would return
    my @data = map { 
        chomp; 
        split /,/; 
        { f1 => $_[0], f2 => $_[1], f3 => $_[2] } 
    } <DATA>;
    
    # arrays for holding bits of results
    my (@res_red, @res_blu, @res_wht, 
        @nxt_res_red, @nxt_res_blu, @nxt_res_wht
     );
    
    # counters for tracking whether we are before or 
    # after the 'Red' divide 
    my $i = -1; my $j;
    
    # array to hold rows *before* the 'Red' divide
    my @seen;
    
    LOOP: for (@data) {
        if ($i >= 0) {

            if ($_->{f2} eq 'White') {
                next LOOP if $j > 3;
                push @nxt_res_wht, $_;
                $j++;
            }
            
            if ($_->{f2} eq 'Blue') {
                push @nxt_res_wht, $_;
                last LOOP;
            }
        }
        else {
            if ($_->{f2} eq 'Red') {
                push @res_red, $_;
                $i++;
                my @rev_seen = reverse @seen;
                my $k;
                REV_WHITE: for (@rev_seen) {
                    if ($_->{f2} eq 'White') {
                        push @res_wht, $_;
                        $k++
                    }
                    last REV_WHITE if $k > 3;
                }
                REV_BLUE: for (@rev_seen) {
                    if ($_->{f2} eq 'Blue') {
                        push @res_blu, $_;
                        last REV_BLUE;
                    }
                }
                push @res_blu, reverse(@res_wht), @res_red;
            }
            else {
                push @seen, $_;
            }
        }
    }
    push @res_blu, @nxt_res_wht, @nxt_res_red;
    return @res_blu;
}


__DATA__
1.00,Blue,12:00
40.20,White,12:00
80.30,White,12:00
120.00,White,12:00
126.00,White,12:00
162.43,White,12:00
198.86,White,12:00
235.29,White,12:00
271.72,Red,03:45
308.15,White,12:00
344.58,White,12:00
381.01,White,12:00
417.44,White,12:00
453.87,White,12:00
490.30,White,12:00
526.73,White,12:00
563.16,Red,07:45
599.59,White,12:00
636.02,White,12:00
672.45,White,12:00
708.88,White,12:00
745.31,White,12:00
781.74,White,12:00
818.17,White,12:00
854.60,Blue,12:00
891.03,White,12:00
963.89,White,12:00
1000.32,White,12:00
1036.75,Red,08:30
1073.18,White,12:00
1109.61,Red,06:00
1146.04,White,12:00
1182.47,White,12:00
1218.90,White,12:00
18516.40,Blue,12:00
1255.33,White,12:00
927.46,White,12:00
[download]

Once you experienced monks get past your laughing at my code (which does work correctly), I would like a lesson into how to approach such class of problems that involve a "window" over an array that spreads on either side of an array element. Of course, more elegant solutions to the above problem are also welcome as they will show me (and other shy monks) a better way.

Update: Responding to Limbic~Region's question, forget for the moment possible "Blue" interleaving among the four "Whites" or other anomalies and deformities. That is how the OP on the SQLite stated the problem, so that is how I have reproduced it above.

--

when small people start casting long shadows, it is time to go to bed

Comment on Extracting array elements on either side of a match Select or Download Code

Replies are listed 'Best First'.
Re: Extracting array elements on either side of a match by GrandFather (Saint) on Dec 01, 2008 at 00:12 UTC
Well, it can be made a little more compact. The following doesn't bother validating data, although such validation is fairly trivial: use strict; use warnings; my @res = analyze (); print "$_->{f1},$_->{f2},$_->{f3}\n" for @res; sub analyze { # get the data as an array of hashes just like DBI would return my @data = map { chomp; my @f = split /,/; {f1 => $f[0], f2 => $f[1], f3 => $f[2]} } <DATA>; my %lists; # Build coloured lists push @{$lists{$data[$_]{f2}}}, $_ for 0 .. $#data; # Find the spans my $redIndex = $lists{Red}[0]; my @priorWhites = grep {$_ < $redIndex} @{$lists{White}}; my $firstWhite = $priorWhites[-4]; my @priorBlues = grep {$_ < $firstWhite} @{$lists{Blue}}; my @postWhites = grep {$_ > $redIndex} @{$lists{White}}; my $lastWhite = $postWhites[3]; my @postBlues = grep {$_ > $lastWhite} @{$lists{Blue}}; # Build the output list my @indexes = ($priorBlues[-1], @priorWhites[-4 .. -1], $redIndex, @postWhites[0 .. 3], $postBlues[0]); return @data[@indexes]; } __DATA__ ... [download] generates output per OP's sample given OP's data. Perl's payment curve coincides with its learning curve.	[reply] [d/l]
Re^2: Extracting array elements on either side of a match by punkish (Priest) on Dec 01, 2008 at 00:21 UTC
verrrrrry clever Grandfather, building an index of the rownums for each of the colors. -- when small people start casting long shadows, it is time to go to bed	[reply]
Re: Extracting array elements on either side of a match by Limbic~Region (Chancellor) on Nov 30, 2008 at 23:52 UTC
punkish, First, It is not clear what should happen if a "Blue" is encountered interleaved amongst the 4 "White"s. Setting that aside for a second, I would probably do something much simpler. A rough outline follows: Establish a scalar to store "Blue" Establish an array to store "White" If line is not "Blue", "White", or "Red" - skip If "Blue", replace whatever $blue is with this line If "White", shift @white if @white > 3, then push line on @white If "Red", just read the values in $blue and @white Working forward should be self-explanatory Update: Since you just need the last N values, it is quite simple to take one off one end of the array and put a new one on the other end. If, on the other hand, you needed to keep track of the top N values, it is a bit more complicated. See Better mousetrap (getting top N values from list X) Cheers - L~R	[reply]
Re: Extracting array elements on either side of a match by BrowserUk (Patriarch) on Dec 01, 2008 at 00:58 UTC
unshift is useful for this: `#! perl -slw use strict; chomp( my @data = <DATA> ); my $i = 0; ++$i until $data[ $i ] =~ m[Red]; my $saved = $i; my @desired = $data[ $i ]; for ( 1 .. 4 ) { --$i until $data[ $i ] =~ m[White]; unshift @desired, $data[ $i-- ]; } --$i until $data[ $i ] =~ m[Blue]; unshift @desired, $data[ $i ]; for ( 1 .. 4 ) { ++$saved until $data[ $saved ] =~ m[White]; push @desired, $data[ $saved++ ]; } ++$saved until $data[ $saved ] =~ m[Blue]; push @desired, $data[ $saved ]; print for @desired; __DATA__` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l]
Re: Extracting array elements on either side of a match by hangon (Deacon) on Dec 01, 2008 at 04:41 UTC
Heres another solution. `my @data = <DATA>; chomp @data; my $iData = 0; my $iColor = 0; my @color = qw(Red White White White White Blue); my @result; while ($iColor < @color && $iData < @data){ if ($data[$iData] =~ /$color[$iColor]/){ push @result, $iData; $iColor ++; } $iData ++; } $iData = $result[0] -1; $iColor = 1; while ($iColor < @color && $iData >= 0){ if ($data[$iData] =~ /$color[$iColor]/){ unshift @result, $iData; $iColor ++; } $iData --; } print "$data[$_]\n" for (@result); __DATA__` [download]	[reply] [d/l]
Re: Extracting array elements on either side of a match by johngg (Canon) on Dec 01, 2008 at 12:41 UTC
This uses push and unshift as others have advocated. The searching has been moved into a more generic subroutine that takes array references for the array to search and indices array to update, a code reference for the selection criterion, and scalars for how many to look for, starting index and whether to search in reverse. use strict; use warnings; use List::Util qw{ first }; my @dataLines = map { [ $_, split m{,\|\s} ] } <DATA>; my @indices = (); push @indices, first { $dataLines[ $_ ]->[ 2 ] eq q{Red} } 0 .. $#dataLines; findNext( \ @dataLines, \ @indices, sub{ $_[ 0 ]->[ 2 ] eq q{White} }, 4, $indices[ -1 ] + 1, 0 ); findNext( \ @dataLines, \ @indices, sub{ $_[ 0 ]->[ 2 ] eq q{White} }, 4, $indices[ 0 ] - 1, 1 ); findNext( \ @dataLines, \ @indices, sub{ $_[ 0 ]->[ 2 ] eq q{Blue} }, 1, $indices[ -1 ] + 1, 0 ); findNext( \ @dataLines, \ @indices, sub{ $_[ 0 ]->[ 2 ] eq q{Blue} }, 1, $indices[ 0 ] - 1, 1 ); print $dataLines[ $_ ]->[ 0 ] for @indices; sub findNext { my( $raLookIn, $raRecord, $rcSelect, $howMany, $startIdx, $reverse ) = @_; return if $reverse ? $startIdx < 0 : $startIdx > $#{ $raLookIn }; for my $idx ( $reverse ? reverse 0 .. $startIdx : $startIdx .. $#{ $raLookIn } ) { next unless $rcSelect->( $raLookIn->[ $idx ] ); if( $reverse ) { unshift @$raRecord, $idx; } else { push @$raRecord, $idx; } return unless -- $howMany; } } __END__ 1.00,Blue,12:00 40.20,White,12:00 80.30,White,12:00 120.00,White,12:00 126.00,White,12:00 162.43,White,12:00 198.86,White,12:00 235.29,White,12:00 271.72,Red,03:45 308.15,White,12:00 344.58,White,12:00 381.01,White,12:00 417.44,White,12:00 453.87,White,12:00 490.30,White,12:00 526.73,White,12:00 563.16,Red,07:45 599.59,White,12:00 636.02,White,12:00 672.45,White,12:00 708.88,White,12:00 745.31,White,12:00 781.74,White,12:00 818.17,White,12:00 854.60,Blue,12:00 891.03,White,12:00 963.89,White,12:00 1000.32,White,12:00 1036.75,Red,08:30 1073.18,White,12:00 1109.61,Red,06:00 1146.04,White,12:00 1182.47,White,12:00 1218.90,White,12:00 18516.40,Blue,12:00 1255.33,White,12:00 927.46,White,12:00 [download] The output `1.00,Blue,12:00 126.00,White,12:00 162.43,White,12:00 198.86,White,12:00 235.29,White,12:00 271.72,Red,03:45 308.15,White,12:00 344.58,White,12:00 381.01,White,12:00 417.44,White,12:00 854.60,Blue,12:00` [download] I hope this is of interest. Cheers, JohnGG	[reply] [d/l] [select]
Re^2: Extracting array elements on either side of a match by BrowserUk (Patriarch) on Dec 01, 2008 at 13:18 UTC
Do you antisipate reusing this 'generic subroutine' next week? Or next month? Ever? If not, how is all that complexity and verbosity any advantage? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply]