Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

a way to do 'sort|uniq'

by jch341277 (Sexton)
on Aug 02, 2005 at 15:27 UTC ( [id://480229]=perlquestion: print w/replies, xml ) Need Help??

jch341277 has asked for the wisdom of the Perl Monks concerning the following question:

I've been thinking for a while that there should be a fairly simple way for doing the same thing as the shell idiom:

$ cat file.txt|sort|uniq

Here's what I've come up with to sort a list of phone numbers:

my $l = 0; map { defined $_ and print "$_\n" } map { ($_->[1] ne $l)? $l=$_->[1] : undef } sort { $a->[1] <=> $b->[1] } map { [s/[\D\n]//g, $_] } <>;

But you could use it to sort strings just as easily. You probably don't want to use this to sort any really big files.
I'd be interested in knowing if anyone has found other ways to do this...

Update:After reviewing all the other ways to do this I have to let on that I've been playing with Schwartzian transforms lately so that's why mine took this overly complicated form.
The one-liners are great - however, even after doing a super search for %_ I still can't figure out why doing: @_{@telephones}=(); initializes %_ from the array @telephones?

Replies are listed 'Best First'.
Re: a way to do 'sort|uniq'
by rnahi (Curate) on Aug 02, 2005 at 15:35 UTC

      Perl Idioms Explained - keys %{{map{$_=>1}@list}} does caveat that this may not be appropriate for large lists. If that's a concern with the input being processed, the same thing can be done in line-by-line fashion.

      perl -e '$seen{$_} = 1 while <>; print sort keys %seen' input.txt

      The article has some good commentary. For example, this approch above is slowest. Faster is this with $seen{$_} = undef, the grep approach is faster still, and the slice approach is apparently even faster.

      -xdg

      Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

Re: a way to do 'sort|uniq'
by merlyn (Sage) on Aug 02, 2005 at 15:40 UTC
      That search doesn't result in anything particularly useful. The first match, the one which looks the most promising and the one which refers to you by name, is a broken link. A cached version is readable. It would be nice if you provided a link to your document instead of having us wade through broken links trying to figure out what you mean.

      ...the useless use of cat...

      What do we have here?! It sure looks like a useless use of "use of"1. What's wrong with "the useless cat"? :-)

      1Or, if I'm to practice what I preach, a useless "use of".

      the lowliest monk

      I was generalizing the construct - I usually do something like:

      $ tail +2 file.txt|tr -dc '[0-9\n]'|sort|uniq -c|sort -rnk 1

      I realize that the 'cat' would be useless in the simplified example that I gave, but the point was I wanted a way to do that same type of thing all in one step with perl.

Re: a way to do 'sort|uniq' (efficient)
by tye (Sage) on Aug 02, 2005 at 18:26 UTC

    To be efficient, it is something that the sorting algorithm should support itself. Hence "sort -u" existing despite "sort | uniq" working (if you don't run out of resources). So this option would be a nice addition to sort.pm.

    - tye        

Re: a way to do 'sort|uniq'
by Roy Johnson (Monsignor) on Aug 02, 2005 at 17:56 UTC
    Another way:
    use warnings; use strict; my %seen; print sort grep !$seen{$_}++, <DATA> __DATA__ c c b c b b a c a b

    Caution: Contents may have been coded under pressure.
Re: a way to do 'sort|uniq'
by sh1tn (Priest) on Aug 02, 2005 at 16:43 UTC
    @_{@telephones}=(),print+join$/,sort{ $a <=> $b }keys%_;


      sh1tn - I don't understand how this works:

      @_{@telephones}=(),print+join$/,sort{ $a <=> $b }keys%_;

      I hope you're not disinclined to explain?

        @_{@telephones}=(); # %_ hash from @telephones print+ # print the join $/, # joined with "\n" separator sort{ $a <=> $b }keys%_; # list which comes from %_ keys


Re: a way to do 'sort|uniq'
by Anonymous Monk on Aug 02, 2005 at 16:45 UTC
    If the Perl equivalent of a shell one-liner is 6 lines, I'd use system to shell out.

    However, your Perl version isn't equivalent - it's removing anything that's not a number (s/[\D\n]//g contains many hooks to improvement - I'd write it as tr/0-9//cd or if you insist on a substitution: s/[\D]+//g).

    If I were to do it in Perl, I'd remove the duplicates first (using a hash), then sort. If there are a lot of duplicates, this ought to win (although in modern Perls, sorting with many duplicates is fast).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://480229]
Approved by blazar
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (6)
As of 2024-04-23 13:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found