punkish has asked for the wisdom of the Perl Monks concerning the following question:

Apologies for quite possibly a highly unsuitable post Title. Please feel free to propose a better Title.

A poster on the SQLite list asked the following question: given the following data extracted from the db table

Field1,Field2,Field3 1.00,Blue,12:00 <----- 40.20,White,12:00 80.30,White,12:00 120.00,White,12:00 126.00,White,12:00 <----- 162.43,White,12:00 <----- 198.86,White,12:00 <----- 235.29,White,12:00 <----- 271.72,Red,03:45 <===== 308.15,White,12:00 <----- 344.58,White,12:00 <----- 381.01,White,12:00 <----- 417.44,White,12:00 <----- 453.87,White,12:00 490.30,White,12:00 526.73,White,12:00 563.16,Red,07:45 599.59,White,12:00 636.02,White,12:00 672.45,White,12:00 708.88,White,12:00 745.31,White,12:00 781.74,White,12:00 818.17,White,12:00 854.60,Blue,12:00 <----- 891.03,White,12:00 963.89,White,12:00 1000.32,White,12:00 1036.75,Red,08:30 1073.18,White,12:00 1109.61,Red,06:00 1146.04,White,12:00 1182.47,White,12:00 1218.90,White,12:00 18516.40,Blue,12:00 1255.33,White,12:00 927.46,White,12:00

The poster wanted to find a first occurrence of "Red," then searching from this record find the previous 4 records that contain "White," then find the Previous 1 that contains "Blue." Now do the same thing but searching "next". In other words, the following result was desired:

1.00,Blue,12:00 126.00,White,12:00 162.43,White,12:00 198.86,White,12:00 235.29,White,12:00 271.72,Red,03:45 308.15,White,12:00 344.58,White,12:00 381.01,White,12:00 417.44,White,12:00 854.60,Blue,12:00

I have marked the desired rows with  <----- and  <=====

A very complicated SQL was proposed, but this is the kind of thing best accomplished by in the application. I have been recuperating from a post-Thanksgiving flu/fever, so to pass away boredom, I decided to write the following solution in my favorite programming language.

my @res = analyze(); for (@res) { print $_->{f1} . ',' . $_->{f2} . ',' . $_->{f3} . "\n"; } sub analyze { # get the data as an array of hashes just like DBI would return my @data = map { chomp; split /,/; { f1 => $_[0], f2 => $_[1], f3 => $_[2] } } <DATA>; # arrays for holding bits of results my (@res_red, @res_blu, @res_wht, @nxt_res_red, @nxt_res_blu, @nxt_res_wht ); # counters for tracking whether we are before or # after the 'Red' divide my $i = -1; my $j; # array to hold rows *before* the 'Red' divide my @seen; LOOP: for (@data) { if ($i >= 0) { if ($_->{f2} eq 'White') { next LOOP if $j > 3; push @nxt_res_wht, $_; $j++; } if ($_->{f2} eq 'Blue') { push @nxt_res_wht, $_; last LOOP; } } else { if ($_->{f2} eq 'Red') { push @res_red, $_; $i++; my @rev_seen = reverse @seen; my $k; REV_WHITE: for (@rev_seen) { if ($_->{f2} eq 'White') { push @res_wht, $_; $k++ } last REV_WHITE if $k > 3; } REV_BLUE: for (@rev_seen) { if ($_->{f2} eq 'Blue') { push @res_blu, $_; last REV_BLUE; } } push @res_blu, reverse(@res_wht), @res_red; } else { push @seen, $_; } } } push @res_blu, @nxt_res_wht, @nxt_res_red; return @res_blu; } __DATA__ 1.00,Blue,12:00 40.20,White,12:00 80.30,White,12:00 120.00,White,12:00 126.00,White,12:00 162.43,White,12:00 198.86,White,12:00 235.29,White,12:00 271.72,Red,03:45 308.15,White,12:00 344.58,White,12:00 381.01,White,12:00 417.44,White,12:00 453.87,White,12:00 490.30,White,12:00 526.73,White,12:00 563.16,Red,07:45 599.59,White,12:00 636.02,White,12:00 672.45,White,12:00 708.88,White,12:00 745.31,White,12:00 781.74,White,12:00 818.17,White,12:00 854.60,Blue,12:00 891.03,White,12:00 963.89,White,12:00 1000.32,White,12:00 1036.75,Red,08:30 1073.18,White,12:00 1109.61,Red,06:00 1146.04,White,12:00 1182.47,White,12:00 1218.90,White,12:00 18516.40,Blue,12:00 1255.33,White,12:00 927.46,White,12:00

Once you experienced monks get past your laughing at my code (which does work correctly), I would like a lesson into how to approach such class of problems that involve a "window" over an array that spreads on either side of an array element. Of course, more elegant solutions to the above problem are also welcome as they will show me (and other shy monks) a better way.

Update: Responding to Limbic~Region's question, forget for the moment possible "Blue" interleaving among the four "Whites" or other anomalies and deformities. That is how the OP on the SQLite stated the problem, so that is how I have reproduced it above.

--

when small people start casting long shadows, it is time to go to bed

Replies are listed 'Best First'.
Re: Extracting array elements on either side of a match
by GrandFather (Saint) on Dec 01, 2008 at 00:12 UTC

    Well, it can be made a little more compact. The following doesn't bother validating data, although such validation is fairly trivial:

    use strict; use warnings; my @res = analyze (); print "$_->{f1},$_->{f2},$_->{f3}\n" for @res; sub analyze { # get the data as an array of hashes just like DBI would return my @data = map { chomp; my @f = split /,/; {f1 => $f[0], f2 => $f[1], f3 => $f[2]} } <DATA>; my %lists; # Build coloured lists push @{$lists{$data[$_]{f2}}}, $_ for 0 .. $#data; # Find the spans my $redIndex = $lists{Red}[0]; my @priorWhites = grep {$_ < $redIndex} @{$lists{White}}; my $firstWhite = $priorWhites[-4]; my @priorBlues = grep {$_ < $firstWhite} @{$lists{Blue}}; my @postWhites = grep {$_ > $redIndex} @{$lists{White}}; my $lastWhite = $postWhites[3]; my @postBlues = grep {$_ > $lastWhite} @{$lists{Blue}}; # Build the output list my @indexes = ($priorBlues[-1], @priorWhites[-4 .. -1], $redIndex, @postWhites[0 .. 3], $postBlues[0]); return @data[@indexes]; } __DATA__ ...

    generates output per OP's sample given OP's data.


    Perl's payment curve coincides with its learning curve.
      verrrrrry clever Grandfather, building an index of the rownums for each of the colors.
      --

      when small people start casting long shadows, it is time to go to bed
Re: Extracting array elements on either side of a match
by Limbic~Region (Chancellor) on Nov 30, 2008 at 23:52 UTC
    punkish,
    First, It is not clear what should happen if a "Blue" is encountered interleaved amongst the 4 "White"s. Setting that aside for a second, I would probably do something much simpler. A rough outline follows:
    • Establish a scalar to store "Blue"
    • Establish an array to store "White"
    • If line is not "Blue", "White", or "Red" - skip
    • If "Blue", replace whatever $blue is with this line
    • If "White", shift @white if @white > 3, then push line on @white
    • If "Red", just read the values in $blue and @white
    • Working forward should be self-explanatory

    Update: Since you just need the last N values, it is quite simple to take one off one end of the array and put a new one on the other end. If, on the other hand, you needed to keep track of the top N values, it is a bit more complicated. See Better mousetrap (getting top N values from list X)

    Cheers - L~R

Re: Extracting array elements on either side of a match
by BrowserUk (Patriarch) on Dec 01, 2008 at 00:58 UTC

    unshift is useful for this:

    #! perl -slw use strict; chomp( my @data = <DATA> ); my $i = 0; ++$i until $data[ $i ] =~ m[Red]; my $saved = $i; my @desired = $data[ $i ]; for ( 1 .. 4 ) { --$i until $data[ $i ] =~ m[White]; unshift @desired, $data[ $i-- ]; } --$i until $data[ $i ] =~ m[Blue]; unshift @desired, $data[ $i ]; for ( 1 .. 4 ) { ++$saved until $data[ $saved ] =~ m[White]; push @desired, $data[ $saved++ ]; } ++$saved until $data[ $saved ] =~ m[Blue]; push @desired, $data[ $saved ]; print for @desired; __DATA__

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Extracting array elements on either side of a match
by hangon (Deacon) on Dec 01, 2008 at 04:41 UTC

    Heres another solution.

    my @data = <DATA>; chomp @data; my $iData = 0; my $iColor = 0; my @color = qw(Red White White White White Blue); my @result; while ($iColor < @color && $iData < @data){ if ($data[$iData] =~ /$color[$iColor]/){ push @result, $iData; $iColor ++; } $iData ++; } $iData = $result[0] -1; $iColor = 1; while ($iColor < @color && $iData >= 0){ if ($data[$iData] =~ /$color[$iColor]/){ unshift @result, $iData; $iColor ++; } $iData --; } print "$data[$_]\n" for (@result); __DATA__
Re: Extracting array elements on either side of a match
by johngg (Canon) on Dec 01, 2008 at 12:41 UTC

    This uses push and unshift as others have advocated. The searching has been moved into a more generic subroutine that takes array references for the array to search and indices array to update, a code reference for the selection criterion, and scalars for how many to look for, starting index and whether to search in reverse.

    use strict; use warnings; use List::Util qw{ first }; my @dataLines = map { [ $_, split m{,|\s} ] } <DATA>; my @indices = (); push @indices, first { $dataLines[ $_ ]->[ 2 ] eq q{Red} } 0 .. $#dataLines; findNext( \ @dataLines, \ @indices, sub{ $_[ 0 ]->[ 2 ] eq q{White} }, 4, $indices[ -1 ] + 1, 0 ); findNext( \ @dataLines, \ @indices, sub{ $_[ 0 ]->[ 2 ] eq q{White} }, 4, $indices[ 0 ] - 1, 1 ); findNext( \ @dataLines, \ @indices, sub{ $_[ 0 ]->[ 2 ] eq q{Blue} }, 1, $indices[ -1 ] + 1, 0 ); findNext( \ @dataLines, \ @indices, sub{ $_[ 0 ]->[ 2 ] eq q{Blue} }, 1, $indices[ 0 ] - 1, 1 ); print $dataLines[ $_ ]->[ 0 ] for @indices; sub findNext { my( $raLookIn, $raRecord, $rcSelect, $howMany, $startIdx, $reverse ) = @_; return if $reverse ? $startIdx < 0 : $startIdx > $#{ $raLookIn }; for my $idx ( $reverse ? reverse 0 .. $startIdx : $startIdx .. $#{ $raLookIn } ) { next unless $rcSelect->( $raLookIn->[ $idx ] ); if( $reverse ) { unshift @$raRecord, $idx; } else { push @$raRecord, $idx; } return unless -- $howMany; } } __END__ 1.00,Blue,12:00 40.20,White,12:00 80.30,White,12:00 120.00,White,12:00 126.00,White,12:00 162.43,White,12:00 198.86,White,12:00 235.29,White,12:00 271.72,Red,03:45 308.15,White,12:00 344.58,White,12:00 381.01,White,12:00 417.44,White,12:00 453.87,White,12:00 490.30,White,12:00 526.73,White,12:00 563.16,Red,07:45 599.59,White,12:00 636.02,White,12:00 672.45,White,12:00 708.88,White,12:00 745.31,White,12:00 781.74,White,12:00 818.17,White,12:00 854.60,Blue,12:00 891.03,White,12:00 963.89,White,12:00 1000.32,White,12:00 1036.75,Red,08:30 1073.18,White,12:00 1109.61,Red,06:00 1146.04,White,12:00 1182.47,White,12:00 1218.90,White,12:00 18516.40,Blue,12:00 1255.33,White,12:00 927.46,White,12:00

    The output

    1.00,Blue,12:00 126.00,White,12:00 162.43,White,12:00 198.86,White,12:00 235.29,White,12:00 271.72,Red,03:45 308.15,White,12:00 344.58,White,12:00 381.01,White,12:00 417.44,White,12:00 854.60,Blue,12:00

    I hope this is of interest.

    Cheers,

    JohnGG