Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

== uniq

by northface (Initiate)
on Feb 13, 2001 at 07:21 UTC ( [id://58067]=perlquestion: print w/replies, xml ) Need Help??

northface has asked for the wisdom of the Perl Monks concerning the following question:

Good evening. I am looking for a function that will give me the same result as the shell function `uniq`. ie. discard identical lines fron input. TIA

Replies are listed 'Best First'.
Re (tilly) 1: == uniq
by tilly (Archbishop) on Feb 13, 2001 at 08:27 UTC
    You could modify these implementations.

    A related problem I once had fun with. Process a file and only print lines once (unlike uniq no assumption of sorted input). That isn't quite what you wanted, but here was the fastest version I could come up with in Perl:

    perl -ne 'print if 1 == ++$s{$_}' < in > out
    (Yes, details like where to place the plusses mattered.)

      Here's the fastest and shortest way I could find to process an input file, and print the unique lines inside it:

      perl -ne "$s{$_} ||= print" < in > out

      Explanation: The ||= operator has a behaviour called "short-circuit". If the left-hand side of the operation evaluates to true, the operation will short-circuit and stop - essentially the right-hand side will be ignored and never evaluated. But if the left side is false, then whatever the right-hand side evaluates to will be assigned to the left.

      In this case, we are processing each line from a file called "in". The first time we see $s{$_} it will evaluate to false, setting off the right hand side to be evaluted. This will print $_, return a 1, then set $s{$_} to 1.

      The next time we see $s{$_} inside the loop, it is true, so the ||= never again runs the right-hand side operation. This results in the above code only printing a line the first time it's seen, and no more.

      can't resist...
      perl -ne '!$s{$_}++ && print' < in > out
         MeowChow                                   
                     s aamecha.s a..a\u$&owag.print
        That executes more slowly. You have to waste time creating new variables because of using postincrement rather than preincrement.
Re: == uniq
by MeowChow (Vicar) on Feb 13, 2001 at 07:27 UTC
    Search for unique, you'll find plenty of references, as this question comes up fairly often.

    update: Well most of those nodes discuss how to create a completely unique list, which is not exactly what you want, so here:

    update2: Fixed some yucky code... =)

    my @in = (1,2,3,4,4,4,5,5,6,7,8,9,9,1,2,3,3); ## one way my @out1 = $in[0]; do { push @out1, $_ if $out1[$#out1] ne $_ } for @in; ## or another way my $t = ''; my @out2 = grep { $t eq $_ ? 0 : ($t = $_, 1) } @in;
    Of course, there are many ways to slice this particular cat...
       MeowChow                                   
                   s aamecha.s a..a\u$&owag.print
Re: == uniq
by JojoLinkyBob (Scribe) on Feb 13, 2001 at 21:21 UTC
    Have you considered a hash?
    while(<<fp>>)
    {
    $myhash{$_}=1;
    }
    foreach $k(sort keys myhash)
    {
    print $k;
    }

    DesertCoder

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://58067]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (3)
As of 2024-04-26 06:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found