zzzeno has asked for the wisdom of the Perl Monks concerning the following question:

To whom: Using Perl and no other libraries, I am trying to
read in a binary file and I am getting a conversion
error. So, I sent the same file (word doc) through
Outlook Express to myself and looked over the source
code of the email and noticed that OE (in the multipart
section) had encoded the file different for some
characters. As a result the word doc is corrupt.


Any ideas why my routine gives me different results
than Outlook express? I've tried debugging it char-by-
char but that proved timely and useless.
(Unfortunately, my HOST doesn't have MIME library access. I
must do this in simple Perl. So the encode_base64
function I found on the web.)


MY CODE:
while (<$filename>) { print encode_base64($_); } sub encode_base64 ($) { my $res = ""; my $eol = "\n"; pos($_[0]) = 0; while ($_[0] =~ /(.{1,45})/gs) { $res .= substr(pack("u", $1), 1); chop($res); } $res =~ tr|` -_|AA-Za-z0-9+/|; my $padding = (3 - length($_[0]) % 3) % 3; $res =~ s/.{$padding}$/"=" x $padding/e if $padding; if (length $eol) { $res =~ s/(.{1,76})/$1$eol/g; } $res; }

Ever feel inches away from your destination and every
inch feels like a new beginning? Help. Peter.

Replies are listed 'Best First'.
Re: Question about Binary files
by perlplexer (Hermit) on Apr 19, 2002 at 12:32 UTC
    I hate to bother the horse again but... you need to binmode() your filehandle when dealing with binary files.
    open FH, "<binary.dat" or die "Can't open " $!\n"; binmode FH; while (<FH>){ # code here } close FH;

    --perlplexer
      That horse was already bothered and I had already used the binary FH c +all and got the same results. But I think I have pinpointed the exact location of the problem: it's +in the encode_base64 code itself - on the last few lines: sub encode_base64 ($) { my $res = ""; my $eol = "\n"; pos($_[0]) = 0; while ($_[0] =~ /(.{1,45})/gs) { $res .= substr(pack("u", $1), 1); chop($res); } $res =~ tr|` -_|AA-Za-z0-9+/|; #!!! Here's where the different characters are created!!!! # my $padding = (3 - length($_[0]) % 3) % 3; # $res =~ s/.{$padding}$/"=" x $padding/e if $padding; # if (length $eol) { # $res =~ s/(.{1,76})/$1$eol/g; # } $res; } I have borrowed this code and don't know what it's trying to do - is i +t a standard? Since I have no MIME access on my host, and I'm using U +NIX - is it possible another encoding function exists, something like + uu?encode? Thanks again. Peter.
        I have borrowed this code and don't know what it's trying to do - is it a standard?

        Don't know if that code is standard (probably not, since it doesn't seem to work). But base64 is certainly a standard, which is documented here -- start at page 23.

        Also, follow up on perlplexer's advice, if you can. Got a home directory, some disk space, and a shell prompt? download the MIME module for perl yourself, make a subdir in your home called "perl_modules", install the download there, then add "-I/home/you/perl_modules" on the shebang line of your perl script that uses the module. Perl will find it.

        www.cpan.org will have some info and methods that will make this easy.

        Well, it wasn't obvious from your post that you used binmode().

        I am not that familiar with Base64 encoding algorithm and I don't know if the code that you posted actually works; What I do know is that it's slow - heavy use of regexes and even one s///e where simple substr() would probably suffice.

        Do you have shell access? If so, download and install MIME::Base64. You do not need any special privileges if you install it into your home directory.

        --perlplexer
Re: Question about Binary files
by Anonymous Monk on Apr 20, 2002 at 15:02 UTC
     Thanks for the info on the install. I didn't know that I could do that. That seems the best path because the MIME functions are quicker and better than what UI've done with the basic CGI library. I also managed to get a different version of the encode_base64 perl code which I'll soon try if the install doesn't work. Thanks agains. Peter.