Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks! I have 2 columns with folder sizes, and I would like to know their difference, but using the respective unit. Columns look like this
324K 324K 440K 533K 23T 224G 42G 42G 1.9T 709G 294K 294K 684K 684K 492K 492K 62M 64M 48K 41M 34M 433K 317K 812K
so, some are KB -> KB, some are KB -> GB etc. I was wondering, if there is a fast way to create a 3rd column, with their difference, in the respective unit. The only way I can think of is writing if statements to check what kind of unit I have and then compare. But maybe it can be done faster somehow? Thanks!

Replies are listed 'Best First'.
Re: More elegant way than multiple "if"?
by tobyink (Canon) on Nov 28, 2019 at 14:53 UTC

    Something like this?

    use strict; use warnings; my @data = split /\n/, <<'DATA'; 324K 324K 440K 533K 23T 224G 42G 42G 1.9T 709G 294K 294K 684K 684K 492K 492K 62M 64M 48K 41M 34M 433K 317K 812K DATA my %units = ( K => 1024**1, M => 1024**2, G => 1024**3, T => 1024**4, ); my $unit_re = join '|', sort keys %units; $unit_re = qr/$unit_re/; my $number_re = qr/[0-9]+(?:\.?[0-9]+)/; foreach (@data) { warn "Line seems malformed: $_" && next unless /^\s*($number_re)($unit_re)\s+($number_re)($unit_re)\s* +$/; my $diff = $3 * $units{$4} - $1 * $units{$2}; print "Diff: $diff\n"; }
Re: More elegant way than multiple "if"?
by GrandFather (Saint) on Nov 28, 2019 at 20:53 UTC

    Your instinct to avoid a cascade of if statements is correct. That leads to hard to maintain code. There are some subtleties for a general solution however and in a production context you would want to write a test suite before you start implementing a solution - not least so that you have a good idea of where all the edge cases are.

    The following code uses a couple of functions to change to and from float values:

    use strict; use warnings; while (<DATA>) { my ($first, $second) = split; next if !$second; my $diff = ToUnit(ToFloat($first) - ToFloat($second)); print "$first - $second = $diff\n"; } sub ToUnit { my ($value) = @_; my $unitIndex = 0; my @units = ('', qw( k M G T)); my $neg = $value < 0; $value = -$value if $neg; while ($value >= 1000 && $unitIndex < $#units) { $value /= 1000; ++$unitIndex; } $value = -$value if $neg; return "$value$units[$unitIndex]"; } sub ToFloat { my ($str) = @_; my ($value, $unit) = $str =~ /([\d.+-]+)\s*([kMGT]?)/; my %mul = (k => 1e3, M => 1e6, G => 1e9, T => 1e12); my $neg = $value < 0; $value = -$value if $neg; $value *= $mul{$unit} if $unit && exists $mul{$unit}; $value = -$value if $neg; return $value; } __DATA__ 324k 324k 440k 533k 23T 224G 42G 42G 1.9T 709G 294k 294k 684k 684k 492k 492k 62M 64M 48k 41M 34M 433k 317k 812k

    Prints:

    324k - 324k = 0 440k - 533k = -93k 23T - 224G = 22.776T 42G - 42G = 0 1.9T - 709G = 1.191T 294k - 294k = 0 684k - 684k = 0 492k - 492k = 0 62M - 64M = -2M 48k - 41M = -40.952M 34M - 433k = 33.567M 317k - 812k = -495k

    Note that K got changed to k. This code would be fairly easy to extend to handle standard SI suffixes, but that is left as an exercise for the learner.

    Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
Re: More elegant way than multiple "if"?
by syphilis (Archbishop) on Nov 29, 2019 at 02:31 UTC
    This post originally appeared as a reply to a post of GrandFather's.
    (That was my mistake.) It has now been re-parented to its intended position.
    Thank you GrandFather

    I always like to make use of perl's capacity to ignore garbage when numifying strings, where possible.
    And I also like to avoid use of regex.
    The following assumes that it's not necessary to run any checks on the DATA:
    use strict; use warnings; no warnings 'numeric'; my %suffix = (K => 1, M => 1e3, G => 1e6, T => 1e9); # or alternative values: # my %suffix = (K => 1, M => 1024, G => 1024 ** 2, T => 1024 ** 3); while(<DATA>) { my($col1, $col2) = split; print "$col1 - $col2 = "; my $suff1 = substr($col1, -1, 1); my $suff2 = substr($col2, -1, 1); if($suffix{$suff1} < $suffix{$suff2}) { # Alter $col2 and $suff2 $col2 *= $suffix{$suff2} / $suffix{$suff1}; $suff2 = $suff1; } elsif($suffix{$suff1} > $suffix{$suff2}) { # Alter $col1 - no need to alter $suff1 $col1 *= $suffix{$suff1} / $suffix{$suff2}; } $col1 -= $col2; print "${col1}$suff2\n"; } __DATA__ 324K 324K 440K 533K 23T 224G 42G 42G 1.9T 709G 294K 294K 684K 684K 492K 492K 62M 64M 48K 41M 34M 433K 317K 812K
    Cheers,
    Rob
Re: More elegant way than multiple "if"?
by tybalt89 (Monsignor) on Nov 28, 2019 at 15:53 UTC
    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11109362 use warnings; my %suffix = qw( K 3 M 6 G 9 T 12 ); while( <DATA> ) { print s/(\S+)([TGMK])\s+(\S+)([TGMK])\K/ ' ' . ("$1e$suffix{$2}" - "$3e$suffix{$4}") /er =~ s/((000){1,4})$/ +{reverse %suffix}->{length $1} /er; } __DATA__ 324K 324K 440K 533K 23T 224G 42G 42G 1.9T 709G 294K 294K 684K 684K 492K 492K 62M 64M 48K 41M 34M 433K 317K 812K

    Outputs:

    324K 324K 0 440K 533K -93K 23T 224G 22776G 42G 42G 0 1.9T 709G 1191G 294K 294K 0 684K 684K 0 492K 492K 0 62M 64M -2M 48K 41M -40952K 34M 433K 33567K 317K 812K -495K
Re: More elegant way than multiple "if"?
by LanX (Saint) on Nov 28, 2019 at 15:14 UTC
    > with their difference, in the respective unit

    What is the respective unit? The bigger one?

    The straight forward way is to use 2 functions:

    One to_float () that transforms to a float representing bytes.

    One to_unit() that transforms back to units.

    Than $diff = to_unit( to_float ($left) - to_float ($right) )

    In case you don't want negative differences use abs

    4 units are not too much for ifs or nested ternary operands.

    A more scalable approach is a look up table in a hash.

    %exponent = ( B => 0,  K => 3, M => 6, G => 9, T => 12) #*

    Personally I'd go for the ternaries.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

    update

    *) fixed

Re: More elegant way than multiple "if"?
by Anonymous Monk on Nov 29, 2019 at 01:58 UTC
    Write it using ifs first. Then write it using ternary. Then write it using something else. Then another way. Open two editors side by side and compare for your idea of elegance. If you cant write it but one way then that is what you do. Its as basic as it gets sl significant speed gains between all versions are highly unlikely.