Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

pack on unpack with same template

by jjmoka (Beadle)
on Jul 27, 2021 at 22:33 UTC ( [id://11135409]=perlquestion: print w/replies, xml ) Need Help??

jjmoka has asked for the wisdom of the Perl Monks concerning the following question:

I've found this code
$$xmlScalar_r = pack('U0C*', unpack('U0C*', SGMLencode_data($$xmlScalar_r)||(defined($$xmlScalar_r) ? $$xmlScalar_r : '') ));
regardless of everything in the end, it is then just something like:
$string = ... $x = pack('U0C*', unpack('U0C*', $string));
I see nothing different which cannot be done as just
$x = $string #or without even $string, just $x = ...
so it would seem a useless usage of pack and unpack being the template the same.

Are there special cases of any sort (e.g. different environments, architectures)
where the pack/unpack achieve anything different ?
Thanks

Replies are listed 'Best First'.
Re: pack on unpack with same template
by haukex (Archbishop) on Jul 28, 2021 at 09:07 UTC

    AFAICT, the effect of pack('U0C*', unpack('U0C*', ...)) is the same as utf8::upgrade. See also What does utf8::upgrade actually do.

    use warnings; use strict; use Devel::Peek; my @tests = ( "Hello", "Hell\xF6", "\N{U+20AC}", "\xE2\x82\xAC", ); for my $in (@tests) { my $out = pack('U0C*', unpack('U0C*', $in)); print STDERR "##### ##### ##### pack/unpack ##### ##### #####\n"; Dump($in); Dump($out); print STDERR "##### upgrade #####\n"; Dump($in); utf8::upgrade($in); Dump($in); }

      This is what I suspected it was doing when I saw the question. I did not have a chance to verify it.

      Seeking work! You can reach me at ikegami@adaelis.com

        Yup, confirmed that changing to the "upgraded" internal storage format is the only effect other than making a copy.

        Seeking work! You can reach me at ikegami@adaelis.com

Re: pack on unpack with same template
by kcott (Archbishop) on Jul 28, 2021 at 01:30 UTC

    G'day jjmoka,

    Your assumption seems reasonable. Here's a test script:

    #!/usr/bin/env perl
    
    use strict;
    use warnings;
    use utf8;
    
    use Test::More;
    
    my @test_strings = qw{ abc абв αβγ 🌌🪐🚀  🏻🏼🏽 𓀀𓀁𓀂 ◌◌◌̂ };
    
    plan tests => 0+@test_strings;
    
    ok $_ eq pack('U0C*', unpack('U0C*', $_)) for @test_strings;
    

    Output:

    1..7 ok 1 ok 2 ok 3 ok 4 ok 5 ok 6 ok 7

    As you suggest, there may be special cases. I couldn't think of any off the top of my head; other monks may know of some. You can just add more strings to @test_strings to test them (without needing to change anything else in the test script).

    You just said "I've found this code". Knowing something about the source and context of that code may provide scope for better answers.

    — Ken

      Thank you. I did some testing myself (even though less elegant). I can fill the gap of context but unfortunately the information doesn't help. This is actually just code I've found. Git points me to a 7 years old commit when the repo was moved from svn to git. The real author is only guessable but anyhow all these people already went coding somewhere else. For who is left with it, there is always, at least, this monastery to act like a lighthouse in the darkness.
Re: pack on unpack with same template
by syphilis (Archbishop) on Jul 28, 2021 at 03:54 UTC
    Are there special cases of any sort (e.g. different environments, architectures) where the pack/unpack achieve anything different ?

    I don't know of any such examples wrt to the U0C* template.
    But I'm pretty unknowledgeable when it comes to Unicode, so I likely wouldn't know anyway ;-)

    In general, there are some instances where pack($template, $scalar) will lose information that unpack() has no hope of restoring.
    For example, on a perl whose $Config{nvsize} is greater than $Config{doublesize}:
    >perl -wle "print 'ok' if 2.3 != unpack 'd', pack 'd', 2.3;" ok
    A similar thing could happen with the I or i templates on a perl whose $Config{ivsize} is greater than $Config{intsize}.

    Cheers,
    Rob

      Just as a confirmation (changed != to == and swapped the quotes around for *nix command line):

      $ perl -wle 'print "ok" if 2.3 == unpack "d", pack "d", 2.3;' ok

      — Ken

        Just as a confirmation (changed != to == ....

        Yes, if nvsize <= doublesize (not that nvsize is ever less than doublesize) then the equality holds.
        Otherwise the equality doesn't hold.

        ... and swapped the quotes around for *nix command line

        That was actually unneeded. (No big deal ... just FYI.)
        The rendition that I posted works fine on both Linux and windows.
        Perl one-liners inside double-quotes generally aren't a problem on Linux unless there's a scalar variable in the code - and I deliberately crafted my one-liner without inclusion of a scalar variable in order to achieve that portability ;-)

        Cheers,
        Rob
      Nice, thanks.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11135409]
Approved by marto
Front-paged by kcott
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (5)
As of 2024-03-28 20:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found