Re: eliding number ranges
by Abigail-II (Bishop) on Nov 13, 2003 at 14:53 UTC
|
#!/usr/bin/perl
use strict;
use warnings;
while (<DATA>) {
chomp;
my ($f, $s) = split;
unless (length ($f) == length ($s)) {
print "$f-$s\n";
next;
}
if ($f == $s) {
print $f, "\n";
next;
}
print "$f-", substr ($s => length +(("$f" ^ "$s") =~ /^(\x00*)/) [
+0]), "\n";
}
__DATA__
1 32
4 19
28 39
34 123
321 321
324 329
325 349
340 509
Abigail | [reply] [d/l] |
|
|
print"$f-",substr($s,length+(("$f"^"$s")=~/(\0*)/)[0]),"\n";
can be reduced to
print"$f-",substr($s,/\0*/g&&pos),"\n"for("$f"^"$s");
| [reply] [d/l] [select] |
|
|
print "$f-", substr ($s => length +(("$f" ^ "$s") =~ /^(\x00*)/) [0]),
+ "\n";
| [reply] [d/l] |
|
|
Earlier in the subroutine, values of $f and $s where the length differed were removed. Also, if $f and $s are the same value, they are removed as well. Then comes the line:
print "$f-", substr ($s => length +(("$f" ^ "$s") =~ /^(\x00*)/) [0]),
+ "\n";
What's happening here is that an exclusive or is being applied to both terms. As you might remember from your basic discrete math class, exclusive or returns 1 if the two bits are different and 0 if they are the same. This is why the first and second conditionals are important as we can now guarantee that you won't get 28-976 when $f = 28 and $s = 28976 and we won't get 28- when $f = $s = 28. Now comes the magic part. length (...)[0] is the same as length(...)[0]. Since length only returns a scalar, length(...)[0] makes no sense. Thus the reason for a plus (+) followed by parentheses to give the compiler a hint as to what we actually mean. The parentheses give the match list context with the [0] indicating we want the first captured element of the match (note, if we didn't capture, the regex would only return 1 or boolean true). The parentheses around the "$f" ^ "$s" are there only because =~ has higher precedence than ^. The regex matches the null character (\x00) which in binary looks like 00000000. Since our exclusive or tells us where the string matches with a 0 and where it doesn't match with a 1, our null characters mean we match at those locations. Because our regex capture returns the actual match and since our match is greedy, we will get a string of \x00 returned with a length the same as the beginning matching areas of $f and $s. Then length takes over to return the length. We then have substr($s,length) which returns the substr of $s from position length to the end. Thus we end up printing out $f-(the non-matching portion of $s).
Sorry if I messed up anywhere. Hope this helps.
Thanks to PodMaster for pointing out that I put exclusive or as 1 if they are different and 1 if they are the same. Doh!
antirice The first rule of Perl club is - use Perl The ith rule of Perl club is - follow rule i - 1 for i > 1
| [reply] [d/l] [select] |
|
|
| [reply] |
|
|
#!perl
use strict;
use warnings;
while (my $line=<DATA>) {
chomp $line;
my ($f, $s) = split /\s+/,$line;
unless (length ($f) == length ($s)) {
print "\"$line\" -> $f-$s\n";
next;
}
if ($f == $s) {
print "\"$line\" -> $f\n";
next;
}
my $first = join ' ',map {sprintf("%08b",ord)} ("$f" =~ /./g);
print "$f\t= $first\n";
my $second = join ' ',map {sprintf("%08b",ord)} ("$s" =~ /./g);
print "$s\t= $second\n";
my $xor = join ' ',map {sprintf("%08b",ord)} (("$f" ^ "$s") =~ /./
+g);
print "XOR\t= $xor\n";
my $index_of_first_diff = 0;
$index_of_first_diff++ while($xor =~ /00000000 /g);
print "Index\t= " , ' 'x9x$index_of_first_diff, "$index_of_first_d
+iff\n";
print "Result\t= $f-", substr ($s, $index_of_first_diff), "\n\n";
}
__DATA__
324 329
325 349
340 509
Output -
d:\>test.pl
324 = 00110011 00110010 00110100
329 = 00110011 00110010 00111001
XOR = 00000000 00000000 00001101
Index = 2
Result = 324-9
325 = 00110011 00110010 00110101
349 = 00110011 00110100 00111001
XOR = 00000000 00000110 00001100
Index = 1
Result = 325-49
340 = 00110011 00110100 00110000
509 = 00110101 00110000 00111001
XOR = 00000110 00000100 00001001
Index = 0
Result = 340-509
| [reply] [d/l] [select] |
Re: eliding number ranges
by pg (Canon) on Nov 13, 2003 at 15:34 UTC
|
use strict;
use warnings;
sub elide {
my ($start,$end) = @_;
if ((my $diff1 = length($end) - length($start)) > 0) {
return "$start-$end";
} elsif ($diff1 == 0) {
if ($start == $end) {
return $start;
} else {
if ((my $diff2 = length($end) - length($end - $start)) > 0
+) {
$end %= 10 ** (length($end) - $diff2);
}
return "$start-$end";
}
}
}
while ( <DATA> ) {
print elide( split ' ', $_ ), "\n";
}
__DATA__
1 32
4 19
28 39
34 123
321 321
324 329
325 349
340 509
999 1000
1000 1001
I tried to do it more math by using log10 to determine length, but that failed the second test case I added, because of rounding issue. | [reply] [d/l] |
|
|
__DATA__
28 31
199 201
?
To get it to really work, I had to add a loop:
sub elide {
my ($start,$end) = @_;
if ($end < $start) { ($start, $end) = ($end, $star
+t) }
elsif ($start == $end) { return $start }
if (length($end) > length($start)) { return "$start-$end" }
else {
my $pow = 1;
$pow *= 10 while int($end/$pow) > int($start/$pow);
$end %= $pow;
}
return "$start-$end";
}
| [reply] [d/l] [select] |
|
|
I think the math approach is my favorite. The one suggestion I want to make, though, is more explicitly handling the situation where $start is larger than $end. It's not likely, but I've seen so much unlikely data in my time..
| [reply] |
Re: eliding number ranges
by japhy (Canon) on Nov 13, 2003 at 15:19 UTC
|
Update: stay away from my erroneous regexes.
I'd solve it using a regex:
sub elide_japhy {
my ($s, $e) = @_;
return $s == $e ?
$s :
("$s $e" =~ /^(\d*)\d* \1(\d*)$/ and "$s-$2");
}
The way it works is this: if the lower and upper bounds are equal, just return the number; otherwise (assuming the upper bound is actually greater than the lower bound), we try to match a prefix in the lower bound that is also found in the upper bound (which could just be zero characters), and return the lower bound, a dash, and the remainder of the upper bound.
As an example, for 325 through 349, the prefix is '3', and the remainder is '49'; for 324 through 329, the prefix is '32', and the remainder is '9'.
Finally, we could micro-size it. It's not entirely golfed, but I get rid of intermediate variables.
sub elide_japhy {
$_[0] == $_[1] ?
shift :
("@_" =~ /^(\d*)\d* \1(\d*)$/ and "$_[0]-$2");
}
Oh, and the logic could be embedded in the regex itself:
sub elide_japhy {
"@_" =~ /^(\d*)(\d*) \1(?!\2)(\d*)$/ ? "$_[0]-$3" : shift;
}
_____________________________________________________
Jeff[japhy]Pinyan:
Perl,
regex,
and perl
hacker, who'd like a job (NYC-area)
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;
| [reply] [d/l] [select] |
|
|
Your elide_japhy function is wrong. On the input 2, 21, it returns "2-1".
Abigail
| [reply] |
|
|
Ooh. Well, since your solution far surpasses mine in efficiency, I wouldn't suggest using mine anyway.
_____________________________________________________
Jeff[japhy]Pinyan:
Perl,
regex,
and perl
hacker, who'd like a job (NYC-area)
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;
| [reply] |
OT: Standard page-range style
by BorgCopyeditor (Friar) on Nov 13, 2003 at 19:15 UTC
|
Not strictly (or even loosely) a point about perl, but about style. You said you wanted your output to be "like a book index." Some style manuals demand a behavior that your description of your desired output doesn't handle. The number of "significant digits" should never be less than two if both the page numbers have more than two digits. Example: 204, 207 should yield 204-07. (1, 17, however, should yield 1-17).
Hope this helps. (And yes, I actually have worked as a copyeditor.)
BCE --Your punctuation skills are insufficient!
| [reply] |
Re: eliding number ranges
by Roger (Parson) on Nov 14, 2003 at 06:05 UTC
|
You can do this with straight length and substr.
#!/usr/local/bin/perl -w
use strict;
while (<DATA>) {
chomp;
my ($from, $to) = split /\s+/;
my $diff = $to - $from; # get the difference
($from, $to) = ($to, $from) if $diff < 0; # error checking
if (!$diff) {
print "$from\n";
} else {
$to = substr($to, length($to) - length($diff))
if length($diff) < length($from) and
substr($from, 0, 1) eq substr($to, 0, 1);
print "$from-$to\n";
}
}
__DATA__
1 32
4 19
28 39
34 123
321 321
324 329
325 349
340 509
51 1
And the output -
1-32
4-19
28-39
34-123
321
324-9
325-49
340-509
1-51
Update: Thanks Abigail-II I have fixed the bug in the code.
| [reply] [d/l] [select] |
|
|
| [reply] [d/l] [select] |
|
|
| [reply] |
Re: eliding number ranges
by EdwardG (Vicar) on Nov 14, 2003 at 13:30 UTC
|
Interesting to notice how the collection of test cases is growing as monks review other monk suggestions. Reminds me of XP's Test-Driven Development except in reverse.
To consolidate -
__DATA__
1 32
2 20
2 21
4 19
28 31
28 39
34 123
51 1
199 200
199 201
321 321
324 329
325 349
340 509
999 1000
1000 1001
1999 2000
PS: The unusual case of 51 1 doesn't work with Abigail's code but could be fixed easily, perhaps with ($f,$s) = ($s,$f) if $s < $f;
| [reply] [d/l] [select] |
|
|
Indeed - its not always easy to tell which tests you'll need in advance. But once you do add them, they're there for good.
And my original "solution" fails on a couple of these as well.
qq
| [reply] |
Re: eliding number ranges
by atcroft (Abbot) on Nov 13, 2003 at 22:15 UTC
|
| [reply] |
Re: eliding number ranges
by pizza_milkshake (Monk) on Nov 13, 2003 at 16:09 UTC
|
$ cat index_DATA
1 32
4 19
28 39
34 123
321 321
324 329
325 349
340 509
$ perl -nle's/(\d*)(\d*)\s+\1(\d*)/$1$2-$3/;s/-$//;print' index_DATA
1-32
4-19
28-39
34-123
321
324-9
325-49
340-509
$
perl -e"map print(chr(hex((
q{6f634070617a6d692e7273650a}=~/../g)[hex]))),
(q{375542349abb99098106c}=~/./g)"
| [reply] [d/l] |
|
|
Same principle as japhy, same mistake:
$ echo "2 21" | perl -nle 's/(\d*)(\d*)\s+\1(\d*)/$1$2-$3/; s/-$//
+; print'
2-1
Abigail
| [reply] [d/l] |
|
|
aw christ, that still won't work on 2-20. i suck. apologies to everyone. delete my posts if possible.
perl -e"map print(chr(hex((
q{6f634070617a6d692e7273650a}=~/../g)[hex]))),
(q{375542349abb99098106c}=~/./g)"
| [reply] |