in reply to Better "uniq" idiom?

I have an idiom I use a lot to do quick searches through log files or data streams, or to do a "uniq" on a file without having to sort it.

Using a hash to keep track of what has already gone by is quite common. Even the name you used is seen everywhere: %seen. You can however make it a bit faster. You're now increasing $seen{$1} every time, while you don't really care if it's 1 or 2 or 3, as long as it's not 0 (undef).

perl -ne'$seen{$_}++, print unless $seen{$_}';
It's not great for your golf score, but it can be a lot faster!

Speaking of golf, you could try this one to improve your golf score:
# 123456789_12345 perl -pe'$_=""if$s{$_}++'
Or, using evil symbolic references:
# 123456789_12 perl -pe'$_=""if$$_++'
(Can break when the last line has no trailing linefeed, and equals the name of a special (scalar) variable ;))

U28geW91IGNhbiBhbGwgcm90MTMgY
W5kIHBhY2soKS4gQnV0IGRvIHlvdS
ByZWNvZ25pc2UgQmFzZTY0IHdoZW4
geW91IHNlZSBpdD8gIC0tIEp1ZXJk

Replies are listed 'Best First'.
Re: Re: Better "uniq" idiom?
by Sidhekin (Priest) on Mar 21, 2002 at 15:14 UTC

    I have been aware that you can make it faster that way, but I usually don't. All that extra typing hurts my fingers :-)

    No not really, but I like to keep oneliners short -- they have a nasty habit of hitting the right margin once what I do gets complicated -- and if I ever care about the performance of a oneliner, chances are it is getting complicated.

    Having said that, it occurs to me that you might get the best of both worlds with a slightly different idiom. Compare:

    perl -ne'print unless $seen{$_}++' perl -ne'$seen{$_}++, print unless $seen{$_}' perl -ne'$seen{$_}||=(print,1)'

    Hey! It's even shorter than the original! We may be onto something here ... or maybe I have just been golfing too much lately ... :-)

    Update: Juerd writes for oneliners, I think it's safe to assume print will print succesfully. Well ... I guess it is a good thing you have not seen my one-liners then. But okay, safe or not, it certainly is reasonable. It's not your fault that I am neither :-)

    The Sidhekin
    print "Just another Perl ${\(trickster and hacker)},"

      perl -ne'$seen{$_}||=(print,1)'

      Assuming the print will not fail (and for oneliners, I think it's safe to assume print will print succesfully):

      perl -ne'$seen{$_}||=print'
      See? print, like many other commands, returns true on success, which allows you to shorten the shortened shortening by another 4 characters!

      Implementing s/seen/s/:
      perl -ne'$s{$_}||=print'
      Using symbolic references, assuming no special variable names will be used:
      perl -ne'$$_||=print'
      I think Perl 6 should have an alias for print that is a single \W character ;) Would be fun for golfing, and could compensate for the needed whitespace with string concats :)

      U28geW91IGNhbiBhbGwgcm90MTMgY
      W5kIHBhY2soKS4gQnV0IGRvIHlvdS
      ByZWNvZ25pc2UgQmFzZTY0IHdoZW4
      geW91IHNlZSBpdD8gIC0tIEp1ZXJk
      

Re: Re: Better "uniq" idiom?
by RMGir (Prior) on Mar 20, 2002 at 17:21 UTC
    Ah, cool! Thanks!

    I was about to post "I don't get it" until I realized that the , expression is all subject to the unless clause. Nice!
    --
    Mike