jynx has asked for the wisdom of the Perl Monks concerning the following question:


While writing this (shameless plug), i came up with a (almost) one-line way of removing duplicates from an ever-expanding array and sorting the remaining values. Unfortunately i have two problems:

1) Me thinks this isn't the best way to do it (i really don't want to create two hashes)
2) i don't understand completely why it works.

So, if monks with better knowledge of grep could please help me out here, i would much appreciate it.

sub beautify { my (%old, %new); return sort { $a <=> $b } grep { $new{$_}++; $old{$_}++ unless exists $old{$_}; my $ret = $new{$_}==$old{$_} ? 1 : $new{$_}+1==$old{$_} ? 1 : 0; $old{$_} = $new{$_}; $ret; } @_; }
This gets as called as @array = beautify(@array, $new_value, $new2);

What i want to know is, why doesn't grep just return all values as being ok (since it looks like $old{$_} will always be within 1 of $new{$_})?

And what better ways are there to do this (in one line)?

Please help,
jynx

Replies are listed 'Best First'.
Re: quandery about a grep statement
by danger (Priest) on Mar 29, 2001 at 05:20 UTC

    Perhaps I fail to understand exactly what you are doing, but doesn't this accomplish your goal?

    sub beautify { my %seen; return sort {$a<=>$b} grep {!$seen{$_}++} @_; }

      Well, that's definitely a better way. The thing that annoys me is that i know that way, i've done it before. It didn't seem to work for some reason when i originally tried it (even though it does now; thanks Murphy ;-)

      thanks much,
      jynx

      Update:ok, after everyone's replies i finally realized what happened. In a coding frenzy i wanted to solve one problem and when i switched to a different technique i didn't drop some of the old artifacts (oops).

      Basically, i'm trying to keep track of whether or not a value is acceptable for grep without having to declare the arrays(this was for an obfuscation-only purpose). The fix was to declare the arrays, because of the reasons mentioned and because otherwise nothing would get passed through grep the first time through...

      My main problem is that i check for $new{$_}+1, i should be checking for $new{$_}-1.

      If anyone's interested, the resulting (admittedly horrible) code is:

      sub beautify { %old = %new; sort { $a <=> $b } grep { $new{$_}++; $old{$_}++ unless exists $old{$_}; ($new{$_} == $old{$_}) ? 1 : ($new{$_}-1 == $old{$_}) ? 1 : 0; } @_; }
      This still declares the arrays, just not the same way. Oh well, nothing's perfect, it has lumps in it ;-)

      As also noted, this is *not* the best way to do this, for normal projects of course one would use strict; and warnings.

      Thanks for the help,
      jynx

        Well, if you are doing an obfus then you may wanna blow off declaring hash arrays and use ones that are floating around and don't cause errors. See the three examples below...
        perl -we 'use strict; %_ = %ENV; print $_{TERM},$/' perl -we 'use strict; %@ = %ENV; print $@{USERNAME},$/' perl -we 'use strict; %% = %ENV; print $%{HOME},$/'

        Yes, I'm ashamed that I know about those. =) Also, just for fun, try this one on:

        perl -we 'use strict; %$ = %ENV; print ${$}{SHELL},$/' # note the extra {} on this one to keep the parser from # making the normally correct left turn of ${${SHELL}} and # crying that $SHELL isn't declared...

        --
        $you = new YOU;
        honk() if $you->love(perl)

Re: quandery about a grep statement
by chromatic (Archbishop) on Mar 29, 2001 at 05:22 UTC
    If you want to remove duplicates from an array and sort them, the shortest way I can come up with uses a temporary hash:
    my %hash; @hash{@values} = (); my @sorted_uniq = sort keys %hash;
    That could be compacted further, but I'm lazy and I don't want to overwhelm people (yeah, that's it).

    Depending on how well sorted @values is, this could be more expensive than a simple for loop without grep or map.

Re: quandery about a grep statement
by premchai21 (Curate) on Mar 29, 2001 at 05:23 UTC
    Assuming you mean sorting normally:
    sub beautify { my %stuff; @stuff{@_} = undef; return sort keys %stuff; }
    so that:
    print join ' ',beautify(qw(h e l l o w o r l d)); # prints "d e h l o +r w" print join ' ',beautify(qw(b u m b l e b e e)); # prints "b e l m u"
    If you want it sorted by frequency increasing:
    sub beautify { my %stuff; $stuff{$_}++ for(@_); return sort { $stuff{$a} <=> $stuff{$b} } keys %stuff; }
    so that:
    print join ' ',beautify(qw(h e l l o w o r l d)); # prints "e w r h d +o l" print join ' ',beautify(qw(b u m b l e b e e)); # prints "m u l e b"
    You can, of course, reverse the result if you want it highest-frequency-first.

    Hope this helps.

    Update: Rrgh -- someone else always posts first while I'm typing my reply...

Re: quandery about a grep statement
by mikfire (Deacon) on Mar 29, 2001 at 05:42 UTC
    That seems an awful amount of work. I haven't analyzed the code yet, but from the little information you gave, this seems to work as well and be more understandable:
    sub beautify { my %temp = map { $_ => 1 } @_; return sort { $a <=> $b } keys %temp; }

    Of course, if you have a large array, this may not be efficient enough - you are making several copies of the array and on very large arrays that could hurt.

    Your current code works ( I think ) because:

    1. If it is a new key $new{$_} and $old{$_} will both be 1 and grep will include that key in the returned array
    2. If it isn't an new key,
      1. $new{$_} will be incremented and $old{$_} will not. Thus, for example, $new{$_} == 2 and $old{$_} == 1.
      2. Your secondary test is $new{$_}+1 == $old{$_}. Since we have just established $new{$_} > $old{$_}, it is obviously true that $new{$_} + 1 > $old{$_} and the second tertiary test will return 0.
    You are working way too hard at this. Let the hash worry about enforcing unique keys.

    UPDATE: I just read this again and realized the tone is very wrong. I am not trying to be rude, but I suddenly slipped into writing a math proof. mikfire

      That map of yours is a lot more work than you may realize! (Or will be until Perl 5.6.1.)
Re: quandery about a grep statement
by premchai21 (Curate) on Mar 29, 2001 at 05:46 UTC

    (Abbreviations used: new for $new{$_}, and old likewise.)

    My take on the way in your node:

    Okay. grep is being called, which effectively loops over the entire array. For each element, new is being incremented, and old is being set to 1 if it doesn't exist already. If they are equal, or if new is one lesser, then 1 is returned, otherwise 0 is returned. old is set to new.

    Now suppose we have this array: @array = qw(h e l l o w o r l d); Stepping through your algorithm:

    $_ $new{$_} $old{$_} $ret ============================ h 1 1 1 e 1 1 1 l 1 1 1 l 2 1 0 o 1 1 1 w 1 1 1 o 2 1 0 r 1 1 1 l 3 2 0 d 1 1 1
    which gives you h e l o w r d, which is then sorted. I see why it's working, because new+1 always !=old, and new==old only the first time an element is hit, so that sub-sub there tags only the first of each element for retrieval by grep. You don't need the second condition, and, granted, it's true that there are better ways to do it (see other replies for examples)... but this is how this one works, to the best of my knowledge.

    You're right in saying that old will always be within 1 of new, but it will never be exactly 1 greater.

    Hope this helps.

    Update: Rrgh -- it happened again!