Since we haven't defined what a sequence is yet, let's do that now:
A sequence is a function that represents a finite series of values. When called, the function returns the next value in the series. The first call to the function after the series has been exhausted returns an empty array and resets the sequence back to its beginning.
A couple of points. First, sequences are cyclical. After you exhaust them, they start over again. Second, the values extracted from a sequence are arrays, not scalars. This allows sequences to return multiple values at a time. As we'll see later, these properties allow us to combine sequences in interesting ways.
With our definition in mind, let's begin our implementation. Some preliminaries:
Although most of the code we'll write is functional in nature, Perl's object-oriented syntax is often a convenient way to call functions. To make this syntax available, let's set up a package and write a couple of helper functions to promote functions into Sequences:#!/usr/bin/perl use warnings; use strict;
The package declaration and the new subroutine are standard Perl OO fare. The seqsub function, however, is somewhat odd. We'll use it as an alternative syntax to Perl's normal sub when we want to to create functions (anonymous subroutines) that represent sequences.package Sequences; sub new { my ($proto, $seq) = @_; bless $seq, $proto; } sub seqsub(&) { Sequences->new(@_); }
To see how the set-up works, let's create our first sequence-making function, seq. It creates a sequence out of the series formed by its arguments:
sub seq { my ($i, $elems) = (0, \@_); seqsub { $i < @$elems ? ( $elems->[ $i++ ] ) : do { $i = 0; () }; } }
To see how it works, let's create simple, 3-element sequence and extract its values:
Because we will often want to see what's "in" a sequence, let's create a function to enumerate a sequence's values:my $abcees = seq("a", "b", "c"); $abcees->(); # ("a") $abcees->(); # ("b") $abcees->(); # ("c") $abcees->(); # ( ) # ... the cycle repeats ...
The while loop within the function shows us the idiom for iterating over a sequence's values: We extract values until we get an empty array. We print each value we get. When we're done, we return the sequence itself to facilitate function chaining.use Data::Dumper; sub enumerate { local $Data::Dumper::Terse = 1; local $Data::Dumper::Indent = 0; my ($i, $seq) = (0, $_[0]); while (my @val = $seq->()) { @val = map { ref ($_) ? Dumper($_) : $_ } @val; printf "%2d => %s\n", $i++, "@val"; } $seq; }
As an example, let's enumerate our earlier sequence:
Let's try the alternative OO syntax:enumerate( $abcees ); # 0 => a # 1 => b # 2 => c
Not bad. Now let's move on to something a little more weighty.$abcees->enumerate; # 0 => a # 1 => b # 2 => c
The following function takes two separate sequences s and t and combines them into a sequence whose values are the Cartesian product of the values drawn from s and t, in effect nesting t inside of s:
An example:sub seq_prod2 { my ($s, $t) = @_; my @sval; seqsub { my @tval; while ( !@sval || !(@tval = $t->()) ) { return () unless @sval = $s->(); } ( @sval, @tval ); } };
One of the idioms that appears frequently in functional programming is to take a binary function and generalize it into an n-ary function by "folding" it over a list of arguments. We can use List::Util's reduce function for this purpose. Let's use it generalize seq_prod2 into the n-ary seq_prod:my $one_two_threes = seq( 1 .. 3 ); enumerate( seq_prod2( $abcees, $one_two_threes ) ); # 0 => a 1 # 1 => a 2 # 2 => a 3 # 3 => b 1 # 4 => b 2 # 5 => b 3 # 6 => c 1 # 7 => c 2 # 8 => c 3
Now we can compute the products of any number of sequences:use List::Util qw( reduce ); sub seq_prod { reduce { seq_prod2($a,$b) } @_ ; }
The power of our abstraction is starting to become apparent. Normally we would need to nest three foreach loops (or use a module such as Algorithm::Loops) to do what we just did in one line of code. In the next section, we'll take this idea further to see how much boilerplate code we can factor out.my $you_and_mees = seq( "you", "me" ); seq_prod( $abcees, $one_two_threes, $you_and_mees ) ->enumerate; # 0 => a 1 you # 1 => a 1 me # 2 => a 2 you # 3 => a 2 me # 4 => a 3 you # 5 => a 3 me # 6 => b 1 you # 7 => b 1 me # 8 => b 2 you # 9 => b 2 me # 10 => b 3 you # 11 => b 3 me # 12 => c 1 you # 13 => c 1 me # 14 => c 2 you # 15 => c 2 me # 16 => c 3 you # 17 => c 3 me
One common way to approach such a problem is with nested loops. Consider the case involving three arrays:
Let's try to do the same thing with sequences.my (@alist, @blist, @clist); # ... initialize arrays with values ... foreach my $a (@alist) { foreach my $b (@blist) { foreach my $c (@clist) { # do something with ($a, $b, $c) ... } } }
First, we'll need to convert each array into a sequence. That's easy; we can just use seq. Then, we must combine the sequences; for this we'll use seq_prod, like before. Finally, we'll extract the values from the combined sequence and process them in turn. Using this recipe, we get the following:
That works, but it's clunky.my $combined_sequence = seq_prod( seq(@alist), seq(@blist), seq(@clist) ); while ( my ($a, $b, $c) = $combined_sequence->() ) { # do something with ($a, $b, $c) }
Let's see if we can refine the approach. As the first refinement, let's create helper functions to perform the first two steps, in effect creating a combined sequence from a specification of its subsequences:
Both helpers are tiny but handy and have many uses beyond the one we're aiming for now. For example, we can use seq_from_spec to extract the digits of n-ary numbers:sub seqs { map seq(@$_), @_; } sub seq_from_spec { seq_prod( seqs(@_) ); }
sub nary_digits { my ($base, $digits) = @_; seq_from_spec( ([0..$base-1]) x $digits ); } enumerate( nary_digits(2, 3) ); # 3-digit binary numbers # 0 => 0 0 0 # 1 => 0 0 1 # 2 => 0 1 0 # 3 => 0 1 1 # 4 => 1 0 0 # 5 => 1 0 1 # 6 => 1 1 0 # 7 => 1 1 1
On a naming note, I called the function seq_from_spec instead of something like seq_prod_from_spec because I consider the Cartesian product to be the most fundamental and useful way of combining sets. So the idea of a "sequence specification" that describes a product of sequences naturally follows:
# seq_from_spec([1..3]) === seq(1..3) # seq_from_spec([1..3],[4,5]) === seq(1..3) x seq(4,5) # seq_from_spec(\(@a,@b,...)) === seq(@a) x seq(@b) x ...
Back to our task, another helper factors out the looping boilerplate:
We can use it like so:sub seq_foreach { my ($seq, $fn) = @_; while (my @val = $seq->()) { $fn->(@val); } $seq; }
Finally, a higher-level helper ties the solution together:$abcees->seq_foreach( sub { print "@_\n"; } ); # a # b # c
Now we can replace our original, 3-foreach loop with the following:sub seq_foreach_from_spec { my ($spec, $fn) = @_; seq_foreach( seq_from_spec( @$spec ), $fn ); }
In reflection, that may seem like a long way to have gone just to replace a 3-foreach loop, and it was. But our travels covered more ground than might at first be obvious:seq_foreach_from_spec( [\(@alist, @blist, @clist)], sub { my ($a, $b, $c) = @_; # do something with $a, $b, $c, ... });
The function is grep for sequences. It takes a sequence and a filtering function and returns a new sequence that passes through the values of the original sequence for which the filtering function returns true; all other values are filtered out. Using it, we can construct our odd-integers solution:sub seq_filter { my ($seq, $filter_fn) = @_; seqsub { my @val; 1 while @val = $seq->() and !$filter_fn->(@val); return @val; } }
Now let's say that we want to generate a similar sequence but for even integers. Again, we can use a transformational strategy. This time, however, we'll transform the odd-integer series into an even-integer series by subtracting one from each value. In effect, we're mapping one series to another. Our helper is named accordingly:sub odds_up_to { my $maximum = shift; seq( 1 .. $maximum ) ->seq_filter( sub { $_[0] % 2 } ) } enumerate( odds_up_to(10) ); # 0 => 1 # 1 => 3 # 2 => 5 # 3 => 7 # 4 => 9
And our even-integers solution:sub seq_map { my ($seq, $fn) = @_; seqsub { my @val = $seq->(); @val ? $fn->(@val) : (); } }
With these simple extensions to our vocabulary, we have greatly expanded the usefulness of our mini-language for sequences. Now let's try it out on a real-world problem.sub evens_up_to { odds_up_to( $_[0] + 1 ) ->seq_map( sub { $_[0] - 1 } ); } enumerate( evens_up_to(10) ); # 0 => 0 # 1 => 2 # 2 => 4 # 3 => 6 # 4 => 8 # 5 => 10
What I have is 7 arrays with arbitrary number of elements in each. I would like to create a new array that contains combinations of the other seven in this manner: (1) Only one element from each of the 7 arrays. (2) Minimum of 4 elements per element in final array.With our existing sequence language, the solution for part (1) of the seeker's request is straightforward: Just pass the 7 arrays to seq_from_spec and the resulting sequence will yield all of the combinations.
Part (2) adds a wrinkle, however. It requires that the output combinations each have at least 4 elements. This suggests that part (1) really means, zero or one element from each array. (Otherwise, the at-least-4 constraint is meaningless because all of the combinations will have exactly 7 elements.)
To effect the zero-or-one behavior, we can transform each input array like [1,2,3] into [[],[1],[2],[3]]. Each element in the transformed array is a zero- or one-element array. (Note the insertion of an an "empty" element at the head of the array.) Next, we can pass these transformed arrays to seq_from_spec, as usual. On the backside, we can map the combinations back into the desired form by merging the zero- or one-element arrays. This we can do with a map.
Finally, we'll filter the combinations so that only those of the desired minimum length are kept.
Putting it all together in the form of a generalized solution:
(BTW, this code is the Perl equivalent of the Haskell solution that I posted in the thread.)sub min_length_combinations { my ($min_length, @inputs) = @_; my @input_spec = map [ [], (map [$_], @$_) ], @inputs; seq_from_spec( @input_spec ) ->seq_map( sub { [ map @$_, @_ ] } ) ->seq_filter( sub { @{$_[0]} >= $min_length } ) }
Now, we can solve the example problem from the seeker's original post:
min_length_combinations( 4, map [split//], qw( abc de fgh i jk l m ) )->enumerate; # 0 => ['i','j','l','m'] # 1 => ['i','k','l','m'] # 2 => ['f','j','l','m'] # 3 => ['f','k','l','m'] # # ... # # 862 => ['c','e','h','i','k'] # 863 => ['c','e','h','i','k','m'] # 864 => ['c','e','h','i','k','l'] # 865 => ['c','e','h','i','k','l','m']
Merging sequences in parallel is like joining the teeth of a coat's zipper. As input we have two (or more) separate sequences and as output we have one zipped-together sequence. In keeping with Haskell tradition, our zipper function will let us merge sequences of unequal lengths. In that case, the zipped sequence will be as long as the shortest input sequence. Here's the code:sub seq_series { my $seqs = seq( @_ ); # seq of seqs (!) my $seq; seqsub { my @val; do { ($seq) = $seqs->() unless $seq; @val = $seq->() if $seq; } while !@val && ($seq = $seqs->()); @val; } } seq_series( $abcees, $one_two_threes )->enumerate; # 0 => a # 1 => b # 2 => c # 3 => 1 # 4 => 2 # 5 => 3
The need to handle unequal lengths adds complexity to our code. In particular, the else clause handles this case and resets any partially-read sequences before returning. This ensures that our sequences' next customers don't get short changed by inheriting half-read sequences.sub seq_reset { my $seq = shift; if ($seq) { 1 while $seq->(); } $seq; } sub seq_zip { my $seqs = seq( @_ ); # seq of seqs (!) my $seq_count = @_; seqsub { my @outvals; while (my $seq = $seqs->()) { if (my @val = $seq->()) { push @outvals, @val; } else { seq_reset( $seqs->() ) for 1 .. $seq_count; seq_reset( $seqs ); return (); } } return @outvals; } }
Some examples:
To generalize our zipping options, we can zip sequences with a given "zipper" function:seq_zip( $abcees, $one_two_threes )->enumerate; # 0 => a 1 # 1 => b 2 # 2 => c 3 seq_zip( $abcees, $one_two_threes, $you_and_mees ) ->enumerate; # 0 => a 1 you # 1 => b 2 me
Some examples:sub seq_zip_with { my $zipper_fn = shift; seq_map( seq_zip(@_), $zipper_fn ); }
With these new extensions to our mini-language for sequences, let's tackle another real Perl Monks problem.# some math helpers sub sum { reduce { $a + $b } @_ } sub product { reduce { $a * $b } @_ } seq_zip_with( \&sum, seq(1..3), seq(0..10) )->enumerate; # 0 => 1 # 1 => 3 # 2 => 5 seq_zip_with( \&product, seq(1..5), seq(0..10), seq(2..8) ) ->enumerate; # 0 => 0 # 1 => 6 # 2 => 24 # 3 => 60 # 4 => 120
The seeker has two arrays of strings and wants to operate on each combination of elements from the two. The operation to be performed is a pairwise "comparison" which amounts to counting the character pairs that are formed when each element is zipped with the other.
Since combinations and zips are part of our sequence mini-language, our implementation is straightforward:
my @site1 = qw( AATKKM aatkkm ); my @site2 = qw( GGGGGG gggggg ); my %counts; seq_foreach_from_spec( [ \(@site1, @site2) ], sub { seq_foreach( seq_zip( ( map seq(split//), @_ ) ), sub { $counts{"@_"}++ } ) } ); print Dumper(\%counts), "\n"; # { 'K G' => 2, 'A G' => 2, 'm g' => 1, 'a g' => 2, # 'A g' => 2, 'M G' => 1, 'k g' => 2, 'k G' => 2, # 'T G' => 1, 'a G' => 2, 'm G' => 1, 't G' => 1, # 'K g' => 2, 'M g' => 1, 't g' => 1, 'T g' => 1 }
We'll create two value extractors. The first is a general-purpose extractor that will capture each array value in a sequence as an arrayref. The second is for when we expect the sequence's values to be scalars. In this case, we don't need to wrap each value within an array and can just return it straight.
Some examples:sub seq_values { my $seq = shift; seq_values_scalar( seq_map( $seq, sub { [@_] } ) ); } sub seq_values_scalar { my $seq = shift; my @values; seq_foreach( $seq, sub { push @values, @_ } ); return @values; }
As a more realistic example, let's write a function to transpose a matrix, that is, exchange its rows and columns:print Dumper( [ seq(1..3)->seq_values ] ), "\n"; # [[1],[2],[3]] print Dumper( [ seq(1..3)->seq_values_scalar ] ), "\n"; # [1,2,3]
sub matrix_transpose { my $rows = shift; [ seq_values( seq_zip( seqs(@$rows) ) ) ]; } my $matrix = [ [ 0, 1 ] , [ 2, 3 ] , [ 4, 5 ] ]; print Dumper( matrix_transpose( $matrix ) ), "\n"; # [ [0,2,4] # , [1,3,5] ]
Another way to extract values is to collapse a sequence into a single value by "folding" an accumulating function across the sequence's values. This is similar to what List::Util's reduce function does for arrays. Here's our implementation for sequences:
Continuing with our matrix theme, let's write a function that computes the dot product of two vectors. We multiply the vectors' elements pairwise, and then add the resulting products:sub seq_fold { my ($seq, $fn) = @_; my @accum = $seq->(); while (@accum && (my @val = $seq->())) { @accum = $fn->(@accum, @val); } wantarray ? @accum : $accum[0]; }
sub dot_product { seq_zip_with( \&product, seqs(@_) ) ->seq_fold( \&sum ); } print dot_product( [1,1,1], [1,2,3] ), "\n"; # 6
So far, the ways in which we have combined sequences have been static. In the next part, we'll look at how we can place tiny functions in between the parts of our sequences to introduce dynamism. This simple extension brings many complex manipulations within reach. We'll look at some interesting applications and solve a few more Perl Monks problems. We might even referee a game of hyperdimensional tic-tac-toe.
Thanks for taking the time to read this meditation. If you have any criticisms or can think of any way to make my writing better, please let me know.
Cheers,
Tom
Tom Moertel : Blog / Talks / CPAN / LectroTest / PXSL / Coffee / Movie Rating Decoder
|
---|