toonski has asked for the wisdom of the Perl Monks concerning the following question:

okay, I'd like to convert every character in a large (64k) string to it's ascii value ('A' -> '64'). now I could do this:

while ($x =~ /(\.)/) { $temp = ord($1); $x =~ s/\./$temp/; }

but then it would do around 64k regexes on a single string, which wouldnt be cool. Is there a way to get around this using a single regex or without resorting to pack() or am I better off just running through each character in a for loop with substr and converting it that way?

Replies are listed 'Best First'.
Re: Converting ascii to numbers (unpack)
by tye (Sage) on Feb 15, 2004 at 05:42 UTC
      tye has got it. Though I haven't benchmarked it, unpack is a very efficient method. Here's a disection:

      $a = join ' ', unpack 'C*', $a;

      That is roughly the same thing as:

      @temparray = unpack 'C*', $a; $a = join ' ', @temparray;
      The first part -- the unpack -- uses the template, 'C*', which reads like this: 'C' takes one byte and converts it to an unsigned char value (base 10). Note that per perldoc -f pack 'C' only works with byte-width characters. For Unicode you would probably use U, but that doesn't appear to be an issue in your case.

      The asterisk in the unpack template basically just means to repeat that 'C' template for as long as there are more bytes to unpack into unsigned char values. So the result is that you get a list of unsigned char values (which happen to be the ASCII values) corresponding to the characters (the bytes) in the original string.

      The next line -- the join line -- just serves to concatenate together the list of unsigned char values into one long string with each value separated from the next by a single space character (presumably so you have some prayer of knowing where one unsigned char value ends and the next one starts in the string).

      In tye's example, the @temparray is avoided by just allowing unpack to spill its list of unsigned char values into the parameter list of join.


      Dave

        There are a number of assumptions involved in benchmarking this, but my try shows s/// as twice as fast:
        use Benchmark 'cmpthese'; use strict; use warnings; my $big; $big .= join '',map chr, 0..255 for 0..255; print length($big), " characters.\n"; sub subst { my $tmp; ($tmp=$big) =~ s/(.)/ord($1).' '/seg; $tmp } sub unpac { my $tmp; $tmp = join ' ', unpack 'C*', $big; $tmp } print length(subst()), " characters in ascii numbers.\n"; print "whoops!\n" if subst() ne unpac().' '; cmpthese( -10, { subst => \&subst, unpac => \&unpac });
Re: Converting ascii to numbers
by blokhead (Monsignor) on Feb 15, 2004 at 05:21 UTC
    There are probably a zillion ways to do this. The simplest to me seems like a s///e substitution:
    $x =~ s/(.)/ord $1/egs;
    If you ever want to get the data back though, this is a bad encoding. For instance do you decode "64" as chr(6).chr(4) or just chr(64)? Maybe you should pad out the ASCII values to 3 digits (though probably won't work with some wide unicode characters)
    $x =~ s/(.)/sprintf "%03d", ord $1/egs;
    Then to get the characters back:
    $x =~ s/(\d{3})/chr $1/g;

    blokhead

Re: Converting ascii to numbers
by diotalevi (Canon) on Feb 15, 2004 at 05:19 UTC

    Your original code has a bug by not noticing that . doesn't match newlines without /s. Also, are you so sure you want a straight numeric translation? How would you know where cone character starts and another begins? I used a different sprintf format so you can see where characters end.

    s((.))(sprintf "0x%02x ", ord $1)gs
Re: Converting ascii to numbers
by Skeeve (Parson) on Feb 15, 2004 at 11:46 UTC
    In the sense of TMTOWDI:
    $x=join ' ', map ord,split //,$x;
Re: Converting ascii to numbers
by Abigail-II (Bishop) on Feb 15, 2004 at 16:42 UTC
    Is there a way to get around this using a single regex or without resorting to pack() or am I better off just running through each character in a for loop with substr and converting it that way?
    pack() is by far the fastest method of doing so, and a regex is most likely to be the second fastest method. But you want to dismiss both, and it's not clear to me why. substr() might be a "best of the rest", but does it really matter?

    Abigail