impose 'use bytes' on another package

saintmike has asked for the wisdom of the Perl Monks concerning the following question:

You're familiar with the use bytes pragma, right? Without it, perl operates with unicode characters, as in

        # prints '1'
    print length("\x{03c5}"), "\n";
[download]

while with use bytes, it falls back to byte semantics:

        # prints '2'
    use bytes;
    print length("\x{03c5}"), "\n";
[download]

Now what if I have a module Foo.pm that does a simple calculation:

package Foo;

sub len {
    return length("\x{03c5}");
}

1;
[download]

and I want to impose use bytes semantics on it without modifying its code? Things like

BEGIN {
    package Foo;
    use bytes;
}
use Foo;

package main;
print Foo::len(), "\n";
[download]

won't work because use bytes modifies the behaviour in its lexical scope. Ideas, anyone?

Comment on impose 'use bytes' on another package Select or Download Code

Replies are listed 'Best First'.
Re: impose 'use bytes' on another package by ikegami (Patriarch) on Apr 06, 2006 at 06:16 UTC
In a manner of thinking, Perl has two kinds of strings: strings of characters and strings of bytes. It seems your `len` function expects to be working on strings of bytes, yet you have a string of characters (since 0x03C5 is outside the range of bytes). Why don't your convert your string of characters into a string of bytes? Converting from strings to bytes is known as "encoding", and Encode is the module to do it. The question you have to answer is: Which encoding to you wish to use? You could, for example, encode using utf8: `$octets = encode("utf8", $string);` [download] In context, we get: use Encode qw( encode ); sub string_to_literal { local $_ = @_ ? $_[0] : $_; s/(.)/ my $o = ord($1); if ($1 eq '"' ) { '\\"' } elsif ($1 eq '\\' ) { '\\\\' } elsif ($1 < 0x20 \|\| $1 >= 0x7F) { sprintf('\\x{%X}', $o) } else { $1 } /eg; return qq{"$_"}; } sub octet_dump { return join ' ', map { sprintf('%02X', ord($_)) } map /(.)/g, @_ ? $_[0] : $_; } $string = "\x{03c5}"; print("\$string is ", length($string), " chars long: "); print(string_to_literal($string), "\n"); $octets = encode("utf8", $string); print("\$octets is ", length($octets), " bytes long: "); print(octet_dump($octets), "\n"); [download] outputs `$string is 1 chars long: "\x{3C5}" $octets is 2 bytes long: CF 85` [download] Both `$string` and `$octects` contains "υ", except the character is in Perl's internal character format in `$string` and in utf8 in `$octects`.	[reply] [d/l] [select]
Re^2: impose 'use bytes' on another package by saintmike (Vicar) on Apr 06, 2006 at 14:29 UTC
It seems your len function expects to be working on strings of bytes ... Actually, it's the other way around, but I wasn't interested in dynamically converting bytes to characters or vice versa. I was thinking it should be possible (without reverting to dirty tricks like eval-ing the code) to switch between unicode string and byte string interpretation at run time (or at least at compile time) in a separate module, without modifying the module code.	[reply]
Re: impose 'use bytes' on another package by codeacrobat (Chaplain) on Apr 05, 2006 at 23:01 UTC
Lets try a simpler problem first. An evaluation of the main code of Foo.pm `perl -e 'use bytes; eval q(package Foo; sub len{ length "\x{03c5}"});p +rint Foo::len()' 2` [download] I always thought do "Foo.pm" is the same as eval `cat Foo.pm`. But `$ perl -e 'use bytes; do "Foo.pm";print Foo::len()' 1` [download]	[reply] [d/l] [select]
Re^2: impose 'use bytes' on another package by codeacrobat (Chaplain) on Apr 05, 2006 at 23:12 UTC
Oops forget to post the workaround. All you have to do is get rid of the 1; in the Foo.pm and eval the remaining content of the (no longer)Module. $code = `cat Foo.pm`;$code =~ s/\n1;//s; eval $code; [download] Use it if a quick'n dirty solution is right for you. Otherwise I hope other monks come up with a cleaner solution.	[reply] [d/l]
Re^3: impose 'use bytes' on another package by diotalevi (Canon) on Apr 06, 2006 at 00:42 UTC
You don't have to remove trailing true value and it'd have been nicer if you avoided making a call to the shell and cat when there are perfectly good perl functions for such a thing. This is also a quick and dirty solution but it isn't as craptacular as yours. `local @ARGV = "Foo.pm"; # TODO: make this search @INC local $/; eval "#line Foo.pm 1\nuse bytes;" . <>; die $@ if $@;` [download] ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊	[reply] [d/l]
Re^4: impose 'use bytes' on another package by codeacrobat (Chaplain) on Apr 06, 2006 at 06:32 UTC
Re^5: impose 'use bytes' on another package by diotalevi (Canon) on Apr 06, 2006 at 14:37 UTC
Re^3: impose 'use bytes' on another package by CountZero (Bishop) on Apr 06, 2006 at 05:40 UTC
Hardly practical if the module is a few thousands lines long, contaisn XS-code and calls in tons of other modules. CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law	[reply]