Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: Rosetta Code: Long List is Long :awk(1)+sort(1)

by parv (Parson)
on Dec 12, 2022 at 07:59 UTC ( [id://11148773]=note: print w/replies, xml ) Need Help??


in reply to Rosetta Code: Long List is Long

In awk(1) + sort(1) (environment) ...

#!/bin/sh # Source: https://perlmonks.org/index.pl?node_id=11148773 # # This is one shell implementation based on the problem specification +... # # Rosetta Code: Long List is Long, 20221130, # by eyepopslikeamosquito # https://perlmonks.org/?node_id=11148465 case $# in 0 ) printf "Give a list of files to sort.\n" >&2 exit 1 ;; esac start=$( date '+%s' ) # Takes ~135 s. awk ' \ { cat_count[ $1 ] += $2 } \ END \ { for ( cat in cat_count ) \ { printf "%s\t%s\n", cat, cat_count[ cat ] } \ } \ ' $@ \ | sort -k2,2rn -k1,1 end=$( date '+%s' ) printf "total time: %d s\n" $(( end - start )) >&2

Replies are listed 'Best First'.
Re^2: Rosetta Code: Long List is Long :awk(1)+sort(1)
by marioroy (Prior) on Dec 12, 2022 at 13:47 UTC

    I tried LANG=C and sorting individually (two sorts).

    Results from a Linux box:

    54 seconds LANG=en_US.UTF-8 33 seconds LANG=C sort -k2,2rn -k1,1 23 seconds LANG=C sort -k1,1 | sort -k2,2rn

    Testing:

    #!/bin/sh # https://www.perlmonks.org/?node_id=11148773 if [ $# -eq 0 ]; then printf "Give a list of files to sort.\n" >&2 exit 1 fi LANG=C awk ' { cat_count[ $1 ] += $2 } END { for ( cat in cat_count ) printf "%s\t%s\n", cat, cat_count[ cat ] } ' $@ \ | sort -k1,1 | sort -k2,2rn printf "total time: %d s\n" $SECONDS >&2

      Good point about using LANG=C (to make for a fairer comparison for I had set ascii encoding to parse the input (but not during sorting🤔) in my Python version).

      With that change, takes ~99 s; that and 2 sorts, takes ~60 s.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11148773]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (4)
As of 2024-04-25 13:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found