I posted a meditation a couple of weeks ago about using += in array context. The replies indicated that I was on the right track, in that there does not exist a simple way of saying
my @foo = qw/1 2 3/; my @bar = qw/4 6 8/; @foo += @bar; # foo is now equivalent to qw/5 8 11/
Somewhere along the line you're going to have a map or a foreach trundling over the array elements. I now have a different problem to solve, and suddenly... I was enlightened.
I have a large directory tree (several tens of thousands of files), and I suspect that all the files are owned and grouped to the same user account. A simple File::Find script will be able to sort that out for me very simply. All I have to do is stat each file and increment two hashes, keyed on the uid and the gid of the file. At the end, I print out the hashes and I'll know if my theory is true or not.
While I have yet to write a single line of code, I can already envision pretty well what the code is going to look like. And I pondered on how best to increment the hashes. I figured that I wanted to take a slice of the list returned from stat. (4 for the uid, 5 for the gid). And I got stuck on how to transfer that two element list over to incremente those two hash keys.
And I realised that I was being haunted once again by a different facet of my array context demon. So at least I know that the following code is about the best I can do (insofar as I can't get rid of the temporary lexicals):
my( $uid, $gid ) = (stat $file)[4,5]; ++$user{$uid}; ++$group{$gid};
I.e. you can't do that in one line. And then it suddenly hit me: what I am really trying to do is to point out to the compiler that what I want to do is parallelisable. I suppose this is because I've just finished an article on IA-64 (a.k.a. Itanium, a.k.a. P4) optimisation. This latest chip from Intel allows all sorts of parallel execution to take place, indeed, it's the only way to get the maximum performance out of it.
It turns out that I don't care what order the hash values are incremented. If the CPU decides to schedule both at once, I'll be thrilled to bits. Harking back to my array assignment, on a sufficiently CPU-rich machine, the operation should be able to be executed in more or less constant time. Now I haven't studied parallelism in computing since the Transputer and Occam, so I'm way out of date. I don't know what the state of the art is in terms of compilers being able to spot possibilities for parallel execution in Von Neuman-style procedural languages, but maybe with a bit of syntactic sugar in Perl 6 we'll be able to come closer to it.
update: A number of people have posted a number of interesting techniques to both of these meditations to help me out with the problem at hand, for which I am immensely grateful. But I'm taking the long view. It still begs the question, as to whether Perl is to be instructed in minute detail about what is to be done, or whether it is possible to detect parallel semantics directly from the source code (without either spelling it out in detail or using silly Oracle-style hints to tell the compiler what you want).
If this is possible, my feeling is that the source code is going to look short, sharp and sweet. If parallel processing is going to come to the masses, it's going to be via a mainstream language, be it Perl, Java, Python or C$, err, C#. With its rich linguistic background, Perl has a good chance of pulling it off.
Specifically, in response to DrZaius, I'm thinking of intrathread parallelism. Thread-level parallelism is already too explicit (read: lots of fussy code) for me to want to bother dealing with, and it's cracking nuts with a sledgehammer in this particular context. I only want to keep that CPU pipeline stuffed to the gills. The less explicit syntax I have to use, the better chance there is of pulling it off.
update 2: by the way, for those stumbled across this node via a search and are actually looking for some code to do what I just described, here is what I wound up writing:
#! /usr/bin/perl -w # scan-uid-gid copyright (c) David Landgren 2001 use strict; use File::Find; my $root = shift; $root ||= '.' unless defined $root; my( %user, %group ); find sub { return if $_ eq '.' or $_ eq '..'; my( $uid, $gid ) = (stat)[4,5]; $user{$uid}++; $group{$gid}++; }, $root; print "uids\n"; print "\t$_: $user{$_}\n" foreach( sort keys %user ); print "gids\n"; print "\t$_: $group{$_}\n" foreach( sort keys %group );
It turns out my hypothesis was correct :-), I received the following output:
# scan-uid-gid uids 2000: 18622 gids 200: 18622
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
(tye)Re: Revisiting array context
by tye (Sage) on Jun 18, 2001 at 10:22 UTC | |
|
Re: Revisiting array context
by bwana147 (Pilgrim) on Jun 18, 2001 at 16:07 UTC | |
|
Re: Revisiting array context
by John M. Dlugosz (Monsignor) on Jun 18, 2001 at 02:29 UTC | |
|
Re: Revisiting array context
by DrZaius (Monk) on Jun 18, 2001 at 04:33 UTC | |
|
Re: Revisiting array context
by mattr (Curate) on Jun 18, 2001 at 14:38 UTC |