Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Command line file sorting

by swkronenfeld (Hermit)
on Sep 05, 2007 at 17:10 UTC ( #637213=perlquestion: print w/replies, xml ) Need Help??

swkronenfeld has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks, I just spent awhile formulating the following code. I'm quite sure that it can be improved on, and I'm interesting in knowing how. Here's the goal of the code:

I need to sort a list of directories, and select the second directory when sorted highest first. For example:
dir-100-1 dir-100-10 dir-100-2 dir-100-9 dir-100-99 dir-2-1 dir- +28-1 dir-29-1 dir-29-2 dir-30-1 dir-3-1
I want to sort them to
dir-100-99 dir-100-10 dir-100-9 dir-100-2 dir-100-1 dir-30-1 dir-29-2 dir-29-1 dir-28-1 dir-3-1 dir-2-1
And I would select the second entry, dir-100-10.

Here's my code

ls -d ./dir-* | xargs perl -e'print ((sort {$b->[0] <=> $a->[0] || $b- +>[1] <=> $a->[1] } map { m|/dir-(\d+)-(\d+)|; [$1,$2,$_] } @ARGV)[1]- +>[2])'

A quick summary: it extracts the two numbers, sorts them, and prints out the 2nd entry in the sorted array. Note that I want to print the entire path, which will be an absolute path, unlike my contrived example here. There will always be two numbers separated by a dash, and they should always be sorted as I am doing so (left number, right side as a tie breaker). I bet there is a much simpler way to do what I want. I am ultimately looking for readability, not compactness (but if people turn this into a golf challenge, I would probably learn things too :).

I don't need to use perl for this, but I could not make this work with the command line sort. If this can be done more easily with the shell, I would prefer to do so, but I need to stick to standard shell commands and flags (this will be used on systems without GNU commands).

Thanks in advance

Replies are listed 'Best First'.
Re: Command line file sorting
by artist (Parson) on Sep 05, 2007 at 17:48 UTC
    *nix :
    dirdata:
    dir-100-1
    dir-100-10
    dir-100-2
    dir-100-9
    dir-100-99
    dir-2-1
    dir-28-1
    dir-29-1
    dir-29-2
    dir-30-1
    dir-3-1
    
    Sorting: sort -rnt - -k2 -k3 dirdata <br> Selecting 2nd entry: sort -rnt - -k2 -k3 dirdata | head -2 | tail -1
      artist

      Thanks. I was playing with sort earlier, and was unable to make this work properly. Your solution still has one problem that I need to address, something that I didn't mention above. If there is only one entry, the entry sometimes* gets printed, but I need that result to be empty. Is there an easy way to do this from the shell?

      * - I say sometimes, because on bash under linux, it does not get printed due to an extra blank line that somehow gets generated in the output. On AIX and MacOS X, the entry gets printed.
        Following should not print anything, if there is no data in the 2nd line.
        cat data | awk NR==2
        --Artist
Re: Command line file sorting
by Anno (Deacon) on Sep 05, 2007 at 17:55 UTC
    I don't see much room for simplification. The map block could be replaced by a slightly simpler expression:
    map [ m|/dir-(\d+)-(\d+)$|, $_], @ARGV
    Note the anchor to then end of string in the regex. It makes sure that earlier combinations of digits and "-" in the path don't mess up the capture.

    Also, xargs isn't the right tool to use here. It splits the arguments into portions if there are too many, but you need to sort the entire list in any case. So my suggestion would be

    perl -le'print( (sort {$b->[0] <=> $a->[0] || $b->[1] <=> $a->[1] } ma +p [ m|/dir-(\d+)-(\d+)$|, $_], @ARGV)[1]->[2])' ./dir-*
    Anno
Re: Command line file sorting
by sgt (Deacon) on Sep 05, 2007 at 21:28 UTC

    well artist gave essentially a un*x golf solution. I'll recast it and comment on a couple tricks.

    first 'echo */' (or print) is a good way of getting directories if you are using a posix shell.

    % echo dir*/ | xargs -n1 | sort -rnt- -k2,3 | perl -lne 'print if $. = +=2'

    the 'echo */ | xargs -n1' can be seen as a poor man's transpose. It is useful if you know your paths! spaces don't work directly ...and xargs understands some quoting

    In the end it is safer to use:

    % NL=' > ' % (IFS="$NL"; for i in dir*/; do echo $i; done) | > sort -rnt- -k2,3 | perl -lne 'print if $. ==2' % steph@ape (/home/stephan/t) % % cat dirdata dir-100-1 dir-100-10 dir-100-2 dir-100-9 dir-100-99 dir-2-1 dir-28-1 dir-29-1 dir-29-2 dir-30-1 dir-3-1 % steph@ape (/home/stephan/t) % % cat dirdata | sort -nt- -k2,3 dir-2-1 dir-3-1 dir-28-1 dir-29-1 dir-29-2 dir-30-1 dir-100-1 dir-100-10 dir-100-2 dir-100-9 dir-100-99 % steph@ape (/home/stephan/t) % % cat dirdata | sort -rnt- -k2,3 | perl -lne 'print if $. == 2' dir-100-9
    cheers --stephan

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://637213]
Approved by Anno
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (5)
As of 2022-08-17 20:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?