hv has asked for the wisdom of the Perl Monks concerning the following question:

I'm looking for some code that does the same sort of formatting done by the Unix 'ls' command, which looks something like this:

B Digest Filter MIME SDBM_File Time util ByteLoader DynaLoader GDBM_File NDBM_File Safe Unicode Cwd Encode I18N ODBM_File Socket XS DB_File Errno IO Opcode Storable attrs Data Fcntl IPC POSIX Sys re Devel File List PerlIO Thread threads

The salient features are a) to read entries in sorted order you read down one column before moving over to the next; b) columns are left-justified; c) columns are as wide as they need to be to fit the largest entry within them; d) as many columns as possible are used; e) all but the last column have the same number of entries, and the last column has no more than the other columns.

This is intended for the list of available commands in a help system roughly modelled on gdb's. I haven't written any code to do this yet, because I imagine someone has already done so.

I did find one implementation (thanks ysth) in the implementataion of ls in the Perl Power Tools project, but that uses a more simplistic approach that makes all columns as wide as the widest column, rather than making each column only as wide as it needs to be.

Any suggestions gratefully received.

Hugo

Replies are listed 'Best First'.
Re: 'ls -C' column style
by BrowserUk (Patriarch) on Nov 04, 2004 at 20:56 UTC

    I've put 2 spaces between the columns as that's how it looked in your example, but it wasn't specified.

    #! perl -sl use strict; use List::Util qw[ max sum ]; use Data::Dumper; our $WIDTH ||= 80; sub columnise { my( $width, @list ) = @_; my @lines; for my $rows ( 1 .. @list ) { my $cols = int( (@list + $rows -1) / $rows ); my @aoa = map{ [ @list[ $_ .. $_+$rows-1 ] ] } map{ $_ * $rows } 0 .. $cols-1; my @widths = map{ 2+ max map length( $_||'' ), @$_ } @aoa; next if sum( @widths ) > $WIDTH; for my $row ( 0 .. $rows-1 ) { push @lines, join '', map{ sprintf "%-$widths[ $_ ]s", $aoa[ $_ ][ $row ]||'' } 0 .. $#aoa; } return @lines; } } chomp( my @list = sort <DATA> ); close DATA; print for columnise $WIDTH, @list; __DATA__ <<SNIP>>

    Output at various widths


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon

      Thanks, this is just the sort of thing I was looking for.

      I notice that you treat the 2-char spacer as a "right-padding" on each column rather than as a "between padding" on each pair of columns; this becomes particularly obvious on the -WIDTH=20 example. But I think that would be fairly easy to change, at the cost of a bit of cleanness of the code.

      Hugo

        You've probably done it already, but it suddenly struck me that this fixed the right-padding problem without needing to mess with the code much.

        sub columnise { my( $width, @list ) = @_; my @lines; for my $rows ( 1 .. @list ) { my $cols = int( (@list + $rows -1) / $rows ); my @aoa = map{ [ @list[ $_ .. $_+$rows-1 ] ] } map{ $_ * $rows } 0 .. $cols-1; my @widths = map{ max map length( $_||'' ), @$_ } @aoa; next if sum( @widths, $#widths * 2 ) > $WIDTH; for my $row ( 0 .. $rows-1 ) { push @lines, join ' ', map{ sprintf "%-$widths[ $_ ]s", $aoa[ $_ ][ $row ]||'' } 0 .. $#aoa; } return @lines; } }

        I also played with your binary chop idea, but the list had to be quite long and the width quite narrow before it offset the extra code required.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
        "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
Re: 'ls -C' column style
by fglock (Vicar) on Nov 04, 2004 at 20:23 UTC

    This will loop a few times until it finds the best combination of lines/columns:

    use strict; my @a = sort qw( B Digest Filter MIME SDBM_File Time util ByteLoader DynaLoader GDBM_File NDBM_File Safe Unicode Cwd Encode I18N ODBM_File Socket XS DB_File Errno IO Opcode Storable attrs Data Fcntl IPC POSIX Sys re Devel File List PerlIO Thread threads ); sub try { my ( $n, @a ) = @_; my $lines = int ( 1 + ( @a + $n - 1 ) / $n ); my @result; my $width = 0; for ( 1 .. $n ) { my @b = splice( @a, 0, $lines - 1 ); my $colwidth = (sort( { $a <=> $b } map { 1+length $_ } @b ))[-1]; push @result, [ $colwidth, @b ]; $width += $colwidth; } return $width, @result; } my $SCREEN_WIDTH = 80; my $col = 1; my $last_width = 0; my @result; my $width; my @last_result; while(1) { ( $width, @result ) = try( $col, @a ); last if $width > $SCREEN_WIDTH || $#{$result[0]} <= 1; $col++; $last_width = $width; @last_result = @result; }; @result = @last_result if defined @last_result; my @lines; for my $col ( @result ) { my $width = shift @$col; $_ = sprintf( "%-${width}s", $_ ) for ( @$col ); } for my $line ( 0 .. $#{$result[0]} ) { print $result[$_][$line] for 0 .. $#result; print "\n"; }

    Output (using $SCREEN_WIDTH=50 so that if fits here):

    B Encode IPC SDBM_File XS ByteLoader Errno List Safe attrs Cwd Fcntl MIME Socket re DB_File File NDBM_File Storable threads Data Filter ODBM_File Sys util Devel GDBM_File Opcode Thread Digest I18N POSIX Time DynaLoader IO PerlIO Unicode

    update: changed one of the loop-end controls.

      Thanks, this is nice straightforward code that does what was asked. If I was basing my own use on this, I'd start by changing it somewhat to pass round arrayrefs rather than arrays (and therefore also to work through @a non-destructively with computed indices, rather than destructively with splice).

      Then I'd probably pull try() inline to make it an inner loop, and wrap the whole thing in a function; at that point I think I'd have code pretty similar to BrowserUk's code below.

      Hugo

Re: 'ls -C' column style
by castaway (Parson) on Nov 04, 2004 at 19:23 UTC
    Sorry about the confusion just now.. Try something like this:
    my @list = sort keys %Commands; $colwidth = length((sort {length($b) <=> length($a)} @list)[0] + ) +1; $ncol = int( $Users{$userid}{'screenwidth'}/$colwidth ); $nrow = int( scalar(@list) / $ncol ); $nrow++ if ( scalar(@list) % $ncol ); $fmt = "%-${colwidth}s"; foreach my $r ( 0 .. $nrow-1 ) { my $ind = @help; $help[$ind] = ''; foreach my $c ( 0 .. $ncol-1 ) { my $i = $c * $nrow + $r; $help[$ind] .= sprintf( $fmt, $list[$i] ) if($list[$i] +); } }
    (Shamelessly cut&pasted from my own code, stolen from somewhere else, but I've forgotten where.. )

    C.

      Thanks for that. This code is similar to the chunk in the Perl Power Tools 'ls' I mentioned in the OP - all columns are forced to be the same width as the widest, which can waste a lot of space if (for example) you have lots of narrow entries and one wide one.

      Hugo

Re: 'ls -C' column style (sideways)
by tye (Sage) on Nov 04, 2004 at 22:33 UTC

    This one loops over the list of items once (at most). It computes the widths for all possible numbers of columns as it goes, eliminating cases of too-many columns when it determines that they would be too wide.

    Fully tested and now it does not waste time on column counts over about @items/2 and (update 3) chooses single-row output correctly.

    #!/usr/bin/perl -w use strict; my( $maxWid )= ( @ARGV, 79 ); my @items = qw( B ByteLoader Cwd DB_File Data Devel Digest DynaLoader Encode Errno Fcntl File Filter GDBM_File I18N IO IPC List MIME NDBM_File ODBM_File Opcode POSIX PerlIO SDBM_File Safe Socket Storable Sys Thread Time Unicode XS attrs re threads util ); print '='x$maxWid, $/; print Columns( $maxWid, \@items ); exit( 0 ); sub Columns { my( $maxWid, $avItems )= @_; my $maxCols= 1 + $#$avItems/2; my @height= ( 0, map 1+int($#$avItems/$_), 1..$maxCols ); my @total= ( $#$avItems, 0 .. ($maxCols-1) ); my @width; for my $i ( 0..$#$avItems ) { my $len = length( $avItems->[$i] ); $total[0] += $len; for my $cols ( 1 .. $maxCols ) { for( $width[ $cols ][ $i/$height[$cols] ] ) { $_ ||= 0; if( $_ <= $len ) { $total[$cols] += $len - $_; if( $maxWid < $total[$cols] ) { $maxCols= $cols - 1; } $_= $len; } } last if $maxCols < $cols; } last if $maxCols < 2; } $maxCols ||= 1; my $height= $height[$maxCols]; @width= @{ $width[$maxCols] }; if( $total[0] <= $maxWid ) { $maxCols= @$avItems; $height= 1; @width= (0) x $maxCols; } my $text= ''; for my $l ( 1 .. $height ) { my $i= $l - 1; my $c= 0; while( $i < @$avItems ) { my $item= $avItems->[$i]; $i += $height; if( $i < @$avItems ) { $text .= sprintf "%-$width[$c++]s ", $item; } else { $text .= $item . $/; } } } return $text; }

    - tye        

      This one loops over the list of items once (at most).

      Intriguing. Both your original version and the updated one do much better than mine at small widths (25), but then get progressively worse, where mine gets better.

      I haven't read through your algorithm properly yet, but we do get the same results (except I used 2 spaces not 1 between the columns).

      [22:49:46.54] P:\test>405274 >nul 100 trials of Buk:25 ( 190.051ms total), 1.901ms/trial 100 trials of Buk:50 ( 113.853ms total), 1.139ms/trial 100 trials of Buk:75 ( 88.451ms total), 884us/trial 100 trials of Buk:100 ( 70.376ms total), 703us/trial 100 trials of Buk:125 ( 38.077ms total), 380us/trial 100 trials of Buk:150 ( 46.875ms total), 468us/trial 100 trials of Buk:175 ( 46.875ms total), 468us/trial 100 trials of Buk:200 ( 46.875ms total), 468us/trial [22:50:02.71] P:\test>405274-tye1 >nul 100 trials of Tye1:25 ( 93.742ms total), 937us/trial 100 trials of Tye1:50 ( 190.402ms total), 1.904ms/trial 100 trials of Tye1:75 ( 265.551ms total), 2.656ms/trial 100 trials of Tye1:100 ( 343.750ms total), 3.438ms/tria +l 100 trials of Tye1:125 ( 421.875ms total), 4.219ms/tria +l 100 trials of Tye1:150 ( 453.125ms total), 4.531ms/tria +l 100 trials of Tye1:175 ( 468.750ms total), 4.688ms/tria +l 100 trials of Tye1:200 ( 484.375ms total), 4.844ms/tria +l [22:50:08.43] P:\test>405274-tye2 >nul 100 trials of Tye2:25 ( 107.743ms total), 1.077ms/trial 100 trials of Tye2:50 ( 154.991ms total), 1.550ms/trial 100 trials of Tye2:75 ( 200.225ms total), 2.002ms/trial 100 trials of Tye2:100 ( 227.510ms total), 2.275ms/tria +l 100 trials of Tye2:125 ( 250ms total), 2.500ms/tria +l 100 trials of Tye2:150 ( 234.375ms total), 2.344ms/tria +l 100 trials of Tye2:175 ( 234.375ms total), 2.344ms/tria +l 100 trials of Tye2:200 ( 250ms total), 2.500ms/tria +l

      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon

        I didn't write it this way for performance reasons. But this technique should have good performance when sorting items across instead of down.

        You can also make it faster with a bit of complication by teaching it to consider about 2N^.5 column counts rather than about N/2 of them.

        - tye        

        I suspect the most efficient way would be to do a binary chop on the number of rows, with an initial min/max of 1, @items.

        Hugo

      This is quite nice, if a bit opaque; I fiddled with it to get a 2-char spacer - as well as the sprintf, I modified the initialisation of @total so:

      my @total= ( $#$avItems, map $_ * 2, 0 .. ($maxCols-1) );
      which seemed to do the right thing, but there may be edge cases I missed.

      Hugo

Re: 'ls -C' column style
by gaal (Parson) on Nov 04, 2004 at 19:27 UTC
    GNU has, albeit in a different language.
Re: 'ls -C' column style
by TedPride (Priest) on Nov 05, 2004 at 06:04 UTC
    And here's mine, which again cycles to pick the max number of columns possible. Note that both width and spacer width can be specified.
    use strict; my $width = 80; my $spacer = 2; my @files = sort(split(/\s+/, join('', <DATA>))); my ($pl, $c, $i, $j, $max, $total, @max, @sorted); while () { $pl = ($#files + 1) / ++$c; $pl = int $pl + 1 if $pl != int $pl; $total = 0; for ($i = 0; $i < $c; $i++) { $max = 0; for ($j = 0; $j < $pl; $j++) { $max = length($files[$i*$pl+$j]) if $max < length($files[$i*$pl+$j]); } $total += $max + $spacer; } last if ($width < $total - $spacer); } $pl = ($#files + 1) / --$c; $pl = int $pl + 1 if $pl != int $pl; for ($i = 0; $i < $c; $i++) { $max = 0; for ($j = 0; $j < $pl; $j++) { $sorted[$j][$i] = $files[$i*$pl+$j]; $max = length($files[$i*$pl+$j]) if $max < length($files[$i*$pl+$j]); } $max[$i] = $max + $spacer; } $max[$#max] -= $spacer; for (@sorted) { $c = 0; print sprintf('%-'.$max[$c++].'s', $_) for (@$_); print "\n"; } __DATA__ B Digest Filter MIME SDBM_File Time util ByteLoader DynaLoader GDBM_File NDBM_File Safe Unicode Cwd Encode I18N ODBM_File Socket XS DB_File Errno IO Opcode Storable attrs Data Fcntl IPC POSIX Sys re Devel File List PerlIO Thread threads

      Thanks for this; given the number of ways of doing:

      $pl = int $pl + 1 if $pl != int $pl;
      I'm looking forward (perhaps unwisely) to being able to define an infix ⌈...⌉ operator for ceiling().

      In general that line isn't a safe way of getting a ceiling, since it relies on equality testing of a floating point number; in this case though it would be dangerous only if the list of items were so large as to be impractical to display so it isn't really a problem. For generality I prefer the approach taken in some of the other examples above, effectively:

      sub ceildiv { my($num, $div) = @_; return int(($num + $div - 1) / $div); }
      .. and I suspect you avoided this only because you were using ++$c within the expression - I'd have preferred:
      ++$c; $pl = int((@files + $c - 1)/$c);

      Note that your approach of varying only the number of columns does miss some optimal solutions; for example, with $width = 20, $spacer = 2 you output the results in a single column even though the last 12 items could fit into a second column ('SDBM_File' being the last item that can't fit).

      Hugo

Re: 'ls -C' column style
by ihb (Deacon) on Nov 05, 2004 at 15:20 UTC

    I used a different approach than most other replyers in the thread. Instead of so much matrix stuff I've abstracted away the ugly parts. Also, there's a clear separation between data structure creation and output formatting and output. This is less efficient, but it reads well IMHO and this is the style in which I prefer to code. It's easier for me to have refactored code like this. I, again, used my utility module ihb::List::Util and imported &group and &transpose, and produced the following code. (&transpose should be fairly obvious. &group is explained here).

    use ihb::List::Util qw/ group transpose /; use List::Util qw/ max sum /; my @files = sort qw/ B Digest Filter MIME SDBM_File Time +util ByteLoader DynaLoader GDBM_File NDBM_File Safe Unicode Cwd Encode I18N ODBM_File Socket XS DB_File Errno IO Opcode Storable attrs Data Fcntl IPC POSIX Sys re Devel File List PerlIO Thread threads /; my $width = 55; my $padding = 2; # Columnize my (@cols, @widths); for (my $rows = 1; 1; $rows++) { @cols = group($rows, @files); @{$cols[-1]} = grep defined, @{$cols[-1]}; @widths = map max(map length, @$_), @cols; my $tw = sum(@widths) + $padding * (@widths - 1); last if $tw < $width or $rows == @files; } # Format columns for (0 .. $#cols) { my $format = "%-$widths[$_]s"; for (@{$cols[$_]}) { $_ = sprintf $format, $_; } } # Output rows { local $" = ' ' x $padding; print "@$_\n" for transpose(@cols); }

    In the OP it's not clear how newlines should be handled if the outputted string is equal to the length, also, it's not clear how filenames longer than the width should be handled, so a newline is always appended.

    ihb

    See perltoc if you don't know which perldoc to read!
    Read argumentation in its context!