How come Unix's piping paradigm didn't make it into Perl? Or maybe it did and I didn't notice?
Yes, I know that one can open pipes like this:
...but I have in mind something more integrated into Perl than that.open my $pipe, "foobar|" or die "$!\n"; print frobnicate( $_ ) while <$pipe>:
Specially after the introduction of lexical handles, I would like to be able to take a read handle and transform it somehow to modify its output.
For example, suppose the file foo.tsv consists of newline-separated records of tab-delimited fields, and I want to generate a "view" consisting of those records whose first field has the value 42. Furthermore, I only want fields 1, 3, and 8, and I want the resulting records to be sorted lexicographically. Finally, I want to put everything in foo_view.tsv. Easy:
{ open my $in, 'foo.tsv' or die "$!\n"; my @records; while ( <$in> ) { next unless /^42\t/; chomp; push @records, join( "\t", ( split "\t" )[ 1, 3, 8 ] ) . $/; } open my $out, '>', 'foo_view.tsv' or die "$!\n"; print $out $_ for sort @records; }
But here's a different way to think about this:
The function Filter::grepit takes an open read handle and a regex and returns a read handle that outputs only those records from the original handle that match the regex. The function Filter::cols takes an open read handle, a field delimiter, and a list of field numbers, and returns a record consisting of only those fields. Finally, Filter::sortit returns records in lexicographic order.{ open my $in, 'foo.tsv' or die "$!\n"; $in = Filter::grepit( $in, qr/^42\t/ ); $in = Filter::cols ( $in, "\t", 1, 3, 8 ); $in = Filter::sortit( $in ); open my $out, '>', 'foo_view.tsv' or die "$!\n"; print $out $_ while <$in>; }
Admittedly, this code is not more succinct and not much clearer than in the first version, though, subjectively, I find it easier on the eye somehow. But the potential big win is in the fact that, in principle, to sort the records we no longer have to read all the records into a Perl array, which could take up a lot of memory. This problem is relegated to the implementation of sortit. Of course, sortit could end up doing precisely that behind the scenes, but it could do something else. For example, sortit could fork the job off to sort(1):
sub sortit { my ( $fh ) = shift; return pipeit( $fh, 'sort' ); } sub pipeit { my ( $fh, $cmd ) = @_; my $new_fh; return $new_fh if my $pid = open $new_fh, '-|'; die "Fork failed: $!\n" unless defined $pid; open my $pipe, "|$cmd" or die "Pipe failed: $!\n"; print $pipe $_ while <$fh>; exit 0; }
Now, even for huge files, we can let sort(1) handle the problem of creating intermediate sorted fragments, merging them, etc. I'm sure there are better ways to implement this kind of thing, but you get the idea.
Does anything like this already exist in CPAN? (The closest I've found is PerlIO layers, which I find pretty hard to use.)
PS: FWIW, here are implementations of grepit and cols:
sub grepit { my ( $fh, $keep ) = @_; my $new_fh; return $new_fh if my $pid = open $new_fh, '-|'; die "Fork failed: $!\n" unless defined $pid; my $re = ref $keep ? $keep : qr/\Q$keep/; /$re/ && print STDOUT while <$fh>; exit 0; } sub cols { my ( $fh, $sep, @cols ) = @_; my $new_fh; return $new_fh if my $pid = open $new_fh, '-|'; die "Fork failed: $!\n" unless defined $pid; print STDOUT join( $sep, ( split $sep )[ @cols ] ), "\n" while <$fh>; exit 0; }
the lowliest monk
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Pipe dream
by tilly (Archbishop) on Sep 09, 2005 at 03:57 UTC | |
|
Re: Pipe dream
by jdporter (Paladin) on Sep 09, 2005 at 03:07 UTC | |
|
Re: Pipe dream
by Zaxo (Archbishop) on Sep 09, 2005 at 10:09 UTC | |
|
Re: Pipe dream
by chb (Deacon) on Sep 09, 2005 at 08:25 UTC | |
by tlm (Prior) on Sep 09, 2005 at 12:22 UTC | |
by Anonymous Monk on Sep 09, 2005 at 18:35 UTC | |
by Dominus (Parson) on Sep 12, 2005 at 13:33 UTC | |
by Anonymous Monk on Sep 12, 2005 at 15:14 UTC | |
by tlm (Prior) on Sep 10, 2005 at 12:31 UTC | |
|
Re: Pipe dream
by ambrus (Abbot) on Sep 09, 2005 at 11:07 UTC | |
|
Re: Pipe dream
by Roy Johnson (Monsignor) on Sep 09, 2005 at 14:32 UTC | |
by diotalevi (Canon) on Sep 09, 2005 at 18:03 UTC | |
by Roy Johnson (Monsignor) on Sep 09, 2005 at 18:09 UTC | |
by diotalevi (Canon) on Sep 09, 2005 at 18:10 UTC | |
|
Re: Pipe dream
by ruoso (Curate) on Sep 09, 2005 at 18:29 UTC |