in reply to Re^2: Crazy hash sorting issue.
in thread Crazy hash sorting issue.

The more cases you have, the less efficient it is to loop over the data set each time. That's where a sieve comes in handy. It short-circuits the loop over the filters, and loops through the data only once.

Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart

Replies are listed 'Best First'.
Re^4: Crazy hash sorting issue.
by Perl Mouse (Chaplain) on Nov 07, 2005 at 13:59 UTC
    The more cases you have, the less efficient it is to loop over the data set each time.
    That's why I said that in this case, I'd iterate over the hash twice - because there are only 2 'buckets'. Note also that I gave three solutions, two of them iterating over the data only once (and they only do one test per iteration, unlike your sieve). Realise also that if the number of sieves grows, the gain of using a 'sieve' or 'buckets' over sorting deminishes.

    Would you have had an unknown number of cases, all distinguished by their last 2 characters, I'd used:

    my %buckets; while (my ($k, $v) = each %tapes) { push @{$buckets{substr $v, -2}}, "$k $v\n"; } print @{$bucket{$_}} for sort keys %buckets;
    Which is a lot more efficient (and simpler) than your solution. Your solution isn't very efficient - you critize my "special 2-bucket solution" to not being very efficient because it loops over the data set each time.

    But let's analyze that, ok? Suppose your have a dataset of N elements, and you have M buckets to "sieve" them into. Generalizing my "special 2-bucket solution" to an M-bucket one, requires M loops over the data - but doing one test in each iteration, for a total of N * M tests. Your complicated sieving solution loops over the data only once, but you are (potentially) doing M tests per iteration. So you're still doing N * M tests. That's the same amount of tests. The difference is that my solution doesn't require any extra storage.

    On the other hand, my solution can quickly be adapted to deal with an unknown number of buckets. It requires more storage than the "special 2-bucket solution", but not more than your solution. But what's more interesting, it's doing one pass over the data, and doing just one test per iteration. Hence, N tests. It takes more work to adapt your solution to deal with an unknown number of buckets.

    Perl --((8:>*