qq has asked for the wisdom of the Perl Monks concerning the following question:

The code below works fine, but I'm not thrilled about the loop. The task is to take two intergers and elide them so that any redundant information in the higher number is dropped. So 123,134 becomes 113-34, and 123,123 becomes 123 and 34,123 becomes 34-123. Like a book index. Whats the slick way to do this?
#!/usr/bin/perl use strict; use warnings; sub elide { my ($start,$end) = @_; my $len = length( $end ); for ( reverse 0..$len ) { my $start_start = substr( $start, 0, $_ ); last if $end =~ s/^$start_start//; } return ( $end ) ? $start . '-' . $end : $start; } while ( <DATA> ) { print elide( split ' ', $_ ), "\n"; } __DATA__ 1 32 4 19 28 39 34 123 321 321 324 329 325 349 340 509

Which gives:

1-32 4-19 28-39 34-123 321 324-9 325-49 340-509

Replies are listed 'Best First'.
Re: eliding number ranges
by Abigail-II (Bishop) on Nov 13, 2003 at 14:53 UTC
    #!/usr/bin/perl use strict; use warnings; while (<DATA>) { chomp; my ($f, $s) = split; unless (length ($f) == length ($s)) { print "$f-$s\n"; next; } if ($f == $s) { print $f, "\n"; next; } print "$f-", substr ($s => length +(("$f" ^ "$s") =~ /^(\x00*)/) [ +0]), "\n"; } __DATA__ 1 32 4 19 28 39 34 123 321 321 324 329 325 349 340 509

    Abigail

      print"$f-",substr($s,length+(("$f"^"$s")=~/(\0*)/)[0]),"\n";
      can be reduced to
      print"$f-",substr($s,/\0*/g&&pos),"\n"for("$f"^"$s");
      Can you please explain
      print "$f-", substr ($s => length +(("$f" ^ "$s") =~ /^(\x00*)/) [0]), + "\n";

        Earlier in the subroutine, values of $f and $s where the length differed were removed. Also, if $f and $s are the same value, they are removed as well. Then comes the line:

        print "$f-", substr ($s => length +(("$f" ^ "$s") =~ /^(\x00*)/) [0]), + "\n";

        What's happening here is that an exclusive or is being applied to both terms. As you might remember from your basic discrete math class, exclusive or returns 1 if the two bits are different and 0 if they are the same. This is why the first and second conditionals are important as we can now guarantee that you won't get 28-976 when $f = 28 and $s = 28976 and we won't get 28- when $f = $s = 28. Now comes the magic part. length (...)[0] is the same as length(...)[0]. Since length only returns a scalar, length(...)[0] makes no sense. Thus the reason for a plus (+) followed by parentheses to give the compiler a hint as to what we actually mean. The parentheses give the match list context with the [0] indicating we want the first captured element of the match (note, if we didn't capture, the regex would only return 1 or boolean true). The parentheses around the "$f" ^ "$s" are there only because =~ has higher precedence than ^. The regex matches the null character (\x00) which in binary looks like 00000000. Since our exclusive or tells us where the string matches with a 0 and where it doesn't match with a 1, our null characters mean we match at those locations. Because our regex capture returns the actual match and since our match is greedy, we will get a string of \x00 returned with a length the same as the beginning matching areas of $f and $s. Then length takes over to return the length. We then have substr($s,length) which returns the substr of $s from position length to the end. Thus we end up printing out $f-(the non-matching portion of $s).

        Sorry if I messed up anywhere. Hope this helps.

        Thanks to PodMaster for pointing out that I put exclusive or as 1 if they are different and 1 if they are the same. Doh!

        antirice    
        The first rule of Perl club is - use Perl
        The
        ith rule of Perl club is - follow rule i - 1 for i > 1

        Taking the xor (^) of two strings will yield nulls (\x00) in every character position where the strings have the same character.

        The regex matches the leading nulls, and length counts them. substr skips that many chars in $s and prints the rest.

        I wrote this to help me understand Abigail's code. I'd never seen the xor-string trick before.

        #!perl use strict; use warnings; while (my $line=<DATA>) { chomp $line; my ($f, $s) = split /\s+/,$line; unless (length ($f) == length ($s)) { print "\"$line\" -> $f-$s\n"; next; } if ($f == $s) { print "\"$line\" -> $f\n"; next; } my $first = join ' ',map {sprintf("%08b",ord)} ("$f" =~ /./g); print "$f\t= $first\n"; my $second = join ' ',map {sprintf("%08b",ord)} ("$s" =~ /./g); print "$s\t= $second\n"; my $xor = join ' ',map {sprintf("%08b",ord)} (("$f" ^ "$s") =~ /./ +g); print "XOR\t= $xor\n"; my $index_of_first_diff = 0; $index_of_first_diff++ while($xor =~ /00000000 /g); print "Index\t= " , ' 'x9x$index_of_first_diff, "$index_of_first_d +iff\n"; print "Result\t= $f-", substr ($s, $index_of_first_diff), "\n\n"; } __DATA__ 324 329 325 349 340 509

        Output -

        d:\>test.pl 324 = 00110011 00110010 00110100 329 = 00110011 00110010 00111001 XOR = 00000000 00000000 00001101 Index = 2 Result = 324-9 325 = 00110011 00110010 00110101 349 = 00110011 00110100 00111001 XOR = 00000000 00000110 00001100 Index = 1 Result = 325-49 340 = 00110011 00110100 00110000 509 = 00110101 00110000 00111001 XOR = 00000110 00000100 00001001 Index = 0 Result = 340-509
Re: eliding number ranges
by pg (Canon) on Nov 13, 2003 at 15:34 UTC

    A math way: (added two test cases for possible boundary problem)

    use strict; use warnings; sub elide { my ($start,$end) = @_; if ((my $diff1 = length($end) - length($start)) > 0) { return "$start-$end"; } elsif ($diff1 == 0) { if ($start == $end) { return $start; } else { if ((my $diff2 = length($end) - length($end - $start)) > 0 +) { $end %= 10 ** (length($end) - $diff2); } return "$start-$end"; } } } while ( <DATA> ) { print elide( split ' ', $_ ), "\n"; } __DATA__ 1 32 4 19 28 39 34 123 321 321 324 329 325 349 340 509 999 1000 1000 1001

    I tried to do it more math by using log10 to determine length, but that failed the second test case I added, because of rounding issue.

      Have you tried it with
      __DATA__ 28 31 199 201
      ? To get it to really work, I had to add a loop:
      sub elide { my ($start,$end) = @_; if ($end < $start) { ($start, $end) = ($end, $star +t) } elsif ($start == $end) { return $start } if (length($end) > length($start)) { return "$start-$end" } else { my $pow = 1; $pow *= 10 while int($end/$pow) > int($start/$pow); $end %= $pow; } return "$start-$end"; }
      I think the math approach is my favorite. The one suggestion I want to make, though, is more explicitly handling the situation where $start is larger than $end. It's not likely, but I've seen so much unlikely data in my time..
Re: eliding number ranges
by japhy (Canon) on Nov 13, 2003 at 15:19 UTC
    Update: stay away from my erroneous regexes.

    I'd solve it using a regex:

    sub elide_japhy { my ($s, $e) = @_; return $s == $e ? $s : ("$s $e" =~ /^(\d*)\d* \1(\d*)$/ and "$s-$2"); }
    The way it works is this: if the lower and upper bounds are equal, just return the number; otherwise (assuming the upper bound is actually greater than the lower bound), we try to match a prefix in the lower bound that is also found in the upper bound (which could just be zero characters), and return the lower bound, a dash, and the remainder of the upper bound.

    As an example, for 325 through 349, the prefix is '3', and the remainder is '49'; for 324 through 329, the prefix is '32', and the remainder is '9'.

    Finally, we could micro-size it. It's not entirely golfed, but I get rid of intermediate variables.

    sub elide_japhy { $_[0] == $_[1] ? shift : ("@_" =~ /^(\d*)\d* \1(\d*)$/ and "$_[0]-$2"); }
    Oh, and the logic could be embedded in the regex itself:
    sub elide_japhy { "@_" =~ /^(\d*)(\d*) \1(?!\2)(\d*)$/ ? "$_[0]-$3" : shift; }

    _____________________________________________________
    Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

      Your elide_japhy function is wrong. On the input 2, 21, it returns "2-1".

      Abigail

        Ooh. Well, since your solution far surpasses mine in efficiency, I wouldn't suggest using mine anyway.

        _____________________________________________________
        Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
        s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

OT: Standard page-range style
by BorgCopyeditor (Friar) on Nov 13, 2003 at 19:15 UTC

    Not strictly (or even loosely) a point about perl, but about style. You said you wanted your output to be "like a book index." Some style manuals demand a behavior that your description of your desired output doesn't handle. The number of "significant digits" should never be less than two if both the page numbers have more than two digits. Example: 204, 207 should yield 204-07. (1, 17, however, should yield 1-17).

    Hope this helps. (And yes, I actually have worked as a copyeditor.)

    BCE
    --Your punctuation skills are insufficient!

Re: eliding number ranges
by Roger (Parson) on Nov 14, 2003 at 06:05 UTC
    You can do this with straight length and substr.

    #!/usr/local/bin/perl -w use strict; while (<DATA>) { chomp; my ($from, $to) = split /\s+/; my $diff = $to - $from; # get the difference ($from, $to) = ($to, $from) if $diff < 0; # error checking if (!$diff) { print "$from\n"; } else { $to = substr($to, length($to) - length($diff)) if length($diff) < length($from) and substr($from, 0, 1) eq substr($to, 0, 1); print "$from-$to\n"; } } __DATA__ 1 32 4 19 28 39 34 123 321 321 324 329 325 349 340 509 51 1
    And the output -
    1-32 4-19 28-39 34-123 321 324-9 325-49 340-509 1-51
    Update: Thanks Abigail-II I have fixed the bug in the code.

      This line smelled very fishy to me:
      $to = substr($to, length($to) - length($diff))
      And indeed, on the input 199 200, it prints 199-0.

      Abigail

      I knew that I was maknig it more complicated than needed. This seems like the cleanest approach so far - although the xor trick is pretty nice.

      Thanks everyone for the as-ever illuminating responses.

      qq

Re: eliding number ranges
by EdwardG (Vicar) on Nov 14, 2003 at 13:30 UTC

    Interesting to notice how the collection of test cases is growing as monks review other monk suggestions. Reminds me of XP's Test-Driven Development except in reverse.

    To consolidate -

    __DATA__ 1 32 2 20 2 21 4 19 28 31 28 39 34 123 51 1 199 200 199 201 321 321 324 329 325 349 340 509 999 1000 1000 1001 1999 2000

    PS: The unusual case of  51   1  doesn't work with Abigail's code but could be fixed easily, perhaps with  ($f,$s) = ($s,$f) if $s < $f;

     

      Indeed - its not always easy to tell which tests you'll need in advance. But once you do add them, they're there for good.

      And my original "solution" fails on a couple of these as well.

      qq

Re: eliding number ranges
by atcroft (Abbot) on Nov 13, 2003 at 22:15 UTC
Re: eliding number ranges
by pizza_milkshake (Monk) on Nov 13, 2003 at 16:09 UTC
    $ cat index_DATA 1 32 4 19 28 39 34 123 321 321 324 329 325 349 340 509 $ perl -nle's/(\d*)(\d*)\s+\1(\d*)/$1$2-$3/;s/-$//;print' index_DATA 1-32 4-19 28-39 34-123 321 324-9 325-49 340-509 $

    perl -e"map print(chr(hex(( q{6f634070617a6d692e7273650a}=~/../g)[hex]))), (q{375542349abb99098106c}=~/./g)"

      Same principle as japhy, same mistake:
      $ echo "2 21" | perl -nle 's/(\d*)(\d*)\s+\1(\d*)/$1$2-$3/; s/-$// +; print' 2-1

      Abigail

        aw christ, that still won't work on 2-20. i suck. apologies to everyone. delete my posts if possible.

        perl -e"map print(chr(hex(( q{6f634070617a6d692e7273650a}=~/../g)[hex]))), (q{375542349abb99098106c}=~/./g)"