eliding number ranges

qq has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: eliding number ranges by Abigail-II (Bishop) on Nov 13, 2003 at 14:53 UTC
`#!/usr/bin/perl use strict; use warnings; while (<DATA>) { chomp; my ($f, $s) = split; unless (length ($f) == length ($s)) { print "$f-$s\n"; next; } if ($f == $s) { print $f, "\n"; next; } print "$f-", substr ($s => length +(("$f" ^ "$s") =~ /^(\x00*)/) [ +0]), "\n"; } __DATA__ 1 32 4 19 28 39 34 123 321 321 324 329 325 349 340 509` [download] Abigail	[reply] [d/l]
Golf: eliding number ranges by Roy Johnson (Monsignor) on Nov 14, 2003 at 16:28 UTC
`print"$f-",substr($s,length+(("$f"^"$s")=~/(\0)/)[0]),"\n";` [download] can be reduced to `print"$f-",substr($s,/\0/g&&pos),"\n"for("$f"^"$s");` [download]	[reply] [d/l] [select]
Re: Re: eliding number ranges by Anonymous Monk on Nov 13, 2003 at 15:41 UTC
Can you please explain `print "$f-", substr ($s => length +(("$f" ^ "$s") =~ /^(\x00*)/) [0]), + "\n";` [download]	[reply] [d/l]
Re: Re: Re: eliding number ranges by antirice (Priest) on Nov 13, 2003 at 17:14 UTC
Earlier in the subroutine, values of $f and $s where the length differed were removed. Also, if $f and $s are the same value, they are removed as well. Then comes the line: `print "$f-", substr ($s => length +(("$f" ^ "$s") =~ /^(\x00*)/) [0]), + "\n";` [download] What's happening here is that an exclusive or is being applied to both terms. As you might remember from your basic discrete math class, exclusive or returns 1 if the two bits are different and 0 if they are the same. This is why the first and second conditionals are important as we can now guarantee that you won't get 28-976 when $f = 28 and $s = 28976 and we won't get 28- when $f = $s = 28. Now comes the magic part. `length (...)[0]` is the same as `length(...)[0]`. Since length only returns a scalar, `length(...)[0]` makes no sense. Thus the reason for a plus (+) followed by parentheses to give the compiler a hint as to what we actually mean. The parentheses give the match list context with the [0] indicating we want the first captured element of the match (note, if we didn't capture, the regex would only return 1 or boolean true). The parentheses around the "$f" ^ "$s" are there only because =~ has higher precedence than ^. The regex matches the null character (\x00) which in binary looks like 00000000. Since our exclusive or tells us where the string matches with a 0 and where it doesn't match with a 1, our null characters mean we match at those locations. Because our regex capture returns the actual match and since our match is greedy, we will get a string of \x00 returned with a length the same as the beginning matching areas of $f and $s. Then length takes over to return the length. We then have substr($s,length) which returns the substr of $s from position length to the end. Thus we end up printing out $f-(the non-matching portion of $s). Sorry if I messed up anywhere. Hope this helps. Thanks to PodMaster for pointing out that I put exclusive or as 1 if they are different and 1 if they are the same. Doh! antirice The first rule of Perl club is - use Perl The ith rule of Perl club is - follow rule i - 1 for i > 1	[reply] [d/l] [select]
Re:* eliding number ranges by Roy Johnson (Monsignor) on Nov 13, 2003 at 17:08 UTC
Taking the xor (`^`) of two strings will yield nulls (`\x00`) in every character position where the strings have the same character. The regex matches the leading nulls, and length counts them. substr skips that many chars in `$s` and prints the rest.	[reply]
Re: Re: Re: eliding number ranges by EdwardG (Vicar) on Nov 13, 2003 at 18:24 UTC
I wrote this to help me understand Abigail's code. I'd never seen the xor-string trick before. #!perl use strict; use warnings; while (my $line=<DATA>) { chomp $line; my ($f, $s) = split /\s+/,$line; unless (length ($f) == length ($s)) { print "\"$line\" -> $f-$s\n"; next; } if ($f == $s) { print "\"$line\" -> $f\n"; next; } my $first = join ' ',map {sprintf("%08b",ord)} ("$f" =~ /./g); print "$f\t= $first\n"; my $second = join ' ',map {sprintf("%08b",ord)} ("$s" =~ /./g); print "$s\t= $second\n"; my $xor = join ' ',map {sprintf("%08b",ord)} (("$f" ^ "$s") =~ /./ +g); print "XOR\t= $xor\n"; my $index_of_first_diff = 0; $index_of_first_diff++ while($xor =~ /00000000 /g); print "Index\t= " , ' 'x9x$index_of_first_diff, "$index_of_first_d +iff\n"; print "Result\t= $f-", substr ($s, $index_of_first_diff), "\n\n"; } __DATA__ 324 329 325 349 340 509 [download] Output - `d:\>test.pl 324 = 00110011 00110010 00110100 329 = 00110011 00110010 00111001 XOR = 00000000 00000000 00001101 Index = 2 Result = 324-9 325 = 00110011 00110010 00110101 349 = 00110011 00110100 00111001 XOR = 00000000 00000110 00001100 Index = 1 Result = 325-49 340 = 00110011 00110100 00110000 509 = 00110101 00110000 00111001 XOR = 00000110 00000100 00001001 Index = 0 Result = 340-509` [download]	[reply] [d/l] [select]
Re: eliding number ranges by pg (Canon) on Nov 13, 2003 at 15:34 UTC
A math way: (added two test cases for possible boundary problem) `use strict; use warnings; sub elide { my ($start,$end) = @_; if ((my $diff1 = length($end) - length($start)) > 0) { return "$start-$end"; } elsif ($diff1 == 0) { if ($start == $end) { return $start; } else { if ((my $diff2 = length($end) - length($end - $start)) > 0 +) { $end %= 10 ** (length($end) - $diff2); } return "$start-$end"; } } } while ( <DATA> ) { print elide( split ' ', $_ ), "\n"; } __DATA__ 1 32 4 19 28 39 34 123 321 321 324 329 325 349 340 509 999 1000 1000 1001` [download] I tried to do it more math by using log10 to determine length, but that failed the second test case I added, because of rounding issue.	[reply] [d/l]
Re: Re: eliding number ranges by Roy Johnson (Monsignor) on Nov 13, 2003 at 19:32 UTC
Have you tried it with `__DATA__ 28 31 199 201` [download] ? To get it to really work, I had to add a loop: `sub elide { my ($start,$end) = @_; if ($end < $start) { ($start, $end) = ($end, $star +t) } elsif ($start == $end) { return $start } if (length($end) > length($start)) { return "$start-$end" } else { my $pow = 1; $pow *= 10 while int($end/$pow) > int($start/$pow); $end %= $pow; } return "$start-$end"; }` [download]	[reply] [d/l] [select]
Re: Re: eliding number ranges by melora (Scribe) on Nov 13, 2003 at 18:47 UTC
I think the math approach is my favorite. The one suggestion I want to make, though, is more explicitly handling the situation where $start is larger than $end. It's not likely, but I've seen so much unlikely data in my time..	[reply]
Re: eliding number ranges by japhy (Canon) on Nov 13, 2003 at 15:19 UTC
Update: stay away from my erroneous regexes. I'd solve it using a regex: `sub elide_japhy { my ($s, $e) = @_; return $s == $e ? $s : ("$s $e" =~ /^(\d)\d \1(\d)$/ and "$s-$2"); }` [download] The way it works is this: if the lower and upper bounds are equal, just return the number; otherwise (assuming the upper bound is actually greater than the lower bound), we try to match a prefix in the lower bound that is also found in the upper bound (which could just be zero characters), and return the lower bound, a dash, and the remainder of the upper bound. As an example, for 325 through 349, the prefix is '3', and the remainder is '49'; for 324 through 329, the prefix is '32', and the remainder is '9'. Finally, we could micro-size it. It's not entirely golfed, but I get rid of intermediate variables. `sub elide_japhy { $_[0] == $_[1] ? shift : ("@_" =~ /^(\d)\d* \1(\d)$/ and "$_[0]-$2"); }` [download] Oh, and the logic could be embedded in the regex itself: `sub elide_japhy { "@_" =~ /^(\d)(\d) \1(?!\2)(\d)$/ ? "$_[0]-$3" : shift; }` [download] _____________________________________________________ Jeff`[japhy]`Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area) `s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;`	[reply] [d/l] [select]
Re: eliding number ranges by Abigail-II (Bishop) on Nov 13, 2003 at 15:41 UTC
Your elide_japhy function is wrong. On the input `2, 21`, it returns `"2-1"`. Abigail	[reply]
Re: Re: eliding number ranges by japhy (Canon) on Nov 13, 2003 at 15:45 UTC
Ooh. Well, since your solution far surpasses mine in efficiency, I wouldn't suggest using mine anyway. _____________________________________________________ Jeff`[japhy]`Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area) `s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;`	[reply]
OT: Standard page-range style by BorgCopyeditor (Friar) on Nov 13, 2003 at 19:15 UTC
Not strictly (or even loosely) a point about perl, but about style. You said you wanted your output to be "like a book index." Some style manuals demand a behavior that your description of your desired output doesn't handle. The number of "significant digits" should never be less than two if both the page numbers have more than two digits. Example: 204, 207 should yield 204-07. (1, 17, however, should yield 1-17). Hope this helps. (And yes, I actually have worked as a copyeditor.) BCE --Your punctuation skills are insufficient!	[reply]
Re: eliding number ranges by Roger (Parson) on Nov 14, 2003 at 06:05 UTC
You can do this with straight length and substr. `#!/usr/local/bin/perl -w use strict; while (<DATA>) { chomp; my ($from, $to) = split /\s+/; my $diff = $to - $from; # get the difference ($from, $to) = ($to, $from) if $diff < 0; # error checking if (!$diff) { print "$from\n"; } else { $to = substr($to, length($to) - length($diff)) if length($diff) < length($from) and substr($from, 0, 1) eq substr($to, 0, 1); print "$from-$to\n"; } } __DATA__ 1 32 4 19 28 39 34 123 321 321 324 329 325 349 340 509 51 1` [download] And the output - `1-32 4-19 28-39 34-123 321 324-9 325-49 340-509 1-51` [download] Update: Thanks Abigail-II I have fixed the bug in the code.	[reply] [d/l] [select]
Re: eliding number ranges by Abigail-II (Bishop) on Nov 14, 2003 at 11:34 UTC
This line smelled very fishy to me: `$to = substr($to, length($to) - length($diff))` [download] And indeed, on the input `199 200`, it prints `199-0`. Abigail	[reply] [d/l] [select]
Re: Re: eliding number ranges by qq (Hermit) on Nov 14, 2003 at 11:13 UTC
I knew that I was maknig it more complicated than needed. This seems like the cleanest approach so far - although the xor trick is pretty nice. Thanks everyone for the as-ever illuminating responses. qq	[reply]
Re: eliding number ranges by EdwardG (Vicar) on Nov 14, 2003 at 13:30 UTC
Interesting to notice how the collection of test cases is growing as monks review other monk suggestions. Reminds me of XP's Test-Driven Development except in reverse. To consolidate - `__DATA__ 1 32 2 20 2 21 4 19 28 31 28 39 34 123 51 1 199 200 199 201 321 321 324 329 325 349 340 509 999 1000 1000 1001 1999 2000` [download] PS: The unusual case of 51 1 doesn't work with Abigail's code but could be fixed easily, perhaps with `($f,$s) = ($s,$f) if $s < $f;`	[reply] [d/l] [select]
Re: Re: eliding number ranges by qq (Hermit) on Nov 15, 2003 at 13:01 UTC
Indeed - its not always easy to tell which tests you'll need in advance. But once you do add them, they're there for good. And my original "solution" fails on a couple of these as well. qq	[reply]
Re: eliding number ranges by atcroft (Abbot) on Nov 13, 2003 at 22:15 UTC
You may also wish to look at Expand Ranges in Lists of Numbers and Unsorted IP Addr list to sorted IP list with ranges, for some additional discussions related to converting numbers to ranges and expanding said ranges.	[reply]
Re: eliding number ranges by pizza_milkshake (Monk) on Nov 13, 2003 at 16:09 UTC
`$ cat index_DATA 1 32 4 19 28 39 34 123 321 321 324 329 325 349 340 509 $ perl -nle's/(\d)(\d)\s+\1(\d*)/$1$2-$3/;s/-$//;print' index_DATA 1-32 4-19 28-39 34-123 321 324-9 325-49 340-509 $` [download] perl -e"map print(chr(hex(( q{6f634070617a6d692e7273650a}=~/../g)[hex]))), (q{375542349abb99098106c}=~/./g)"	[reply] [d/l]
Re: eliding number ranges by Abigail-II (Bishop) on Nov 13, 2003 at 16:16 UTC
Same principle as japhy, same mistake: `$ echo "2 21" \| perl -nle 's/(\d)(\d)\s+\1(\d*)/$1$2-$3/; s/-$// +; print' 2-1` [download] Abigail	[reply] [d/l]
Re: Re: eliding number ranges by pizza_milkshake (Monk) on Nov 13, 2003 at 16:33 UTC
aw christ, that still won't work on 2-20. i suck. apologies to everyone. delete my posts if possible. perl -e"map print(chr(hex(( q{6f634070617a6d692e7273650a}=~/../g)[hex]))), (q{375542349abb99098106c}=~/./g)"	[reply]