GregWL has asked for the wisdom of the Perl Monks concerning the following question:

First a bit of background so that you all will understand what I'm trying to accomplish. I'm working on an embedded system that uses an i18n system based on po2c and .po files. The system is written in C and is running out of space for the 16 languages that we need to support. I've decided that if I can compress the various language translations I should be able to make everything fit.

I can get an embedded version of zlib to run on the embedded system, so if I can compress some version of the translation data I can then decompress it before use. I was thinking that if I could compress the string lengths and then the translated strings I could include a hex dump of the compressed data in a C source file and use that as the input to the embedded decompresser.

I've started playing with the Compress::Raw::Zlib functions but I've run into a couple of problems. The first is that the deflate() function always seems to interpret the input data as a string. So if I pass in a number (say the length of a string) and compress it, decompress it, and then dump the result as hex, I get the ascii characters of the starting number, not the hex value of the number.

$num_data = 7; $dflt_status = $deflate->deflate($num_data, $dflt_data); $dflt_status = $deflate->flush($dflt_data); $iflt_status = $inflate->inflate($dflt_data, $iflt_data); print hexdump( $iflt_data, { output_format => ' 0x%C' , suppress_warnings => 1} ); result: 0x37

The second issues is that the terminating null character of the strings can't be included in the compressed data, this is so the decompressed data can be easily handled in C.

I've searched the archives for a thread that mentions compressing numeric values but couldn't find anything, and I haven't found any documentation on Perl functions that indicates that there is a way to compress binary data. I'm hoping that someone here will have some insight into something that will help me figure out a way to do this.

Update

First, sorry everyone, I guess that I simplified my description a bit too much. so here is a bit more.

The i18n system uses a package called po2c that parses a .po file with a Perl script and produces a C source file with a structure with the un-translated strings preceded by an index number. It then puts out a structure for each language with the index number of the translated string and then the string for that index. If there is no translation for an index, it is left out of the translated structure.

Then system that we are building is running on a PIC32 processor with just 512K of Flash and we just don't have enough room for all of the translations. What I was going to do is compress the translated string data and then put that in to the C source file and decompress the translation that is needed at run time. To do this I need to build something that C can understand once it's decompressed. So, I was thinking that a structure with the index number, the length of the string and then the string, padded to integer alignment.

I'm a Perl novice, so I'm trying to understand how all this works and deflate()'s handling of everything as an (essentially) ascii through me for a loop.

The part that I had missed was the pack() function. Thanks to mbethke, bulk88, and especially pmqs (you nailed it) for pointing it out. That is what I need so that I can get all of the pieces in the proper places so that after everything is decompressed on the C side I'll be able to handle it.

One of the problems of trying to understand uest enough to accomplish what you need to do is that you sometimes don't know enough to know what you are missing. I had started out reading two of the books that are available on perl.org but neither of them mention pack(). Guess I'll have to do some more reading so I understand enough to know what I don't know.

A big Thanks to all of the Perl Monks for their wonderful help.

Replies are listed 'Best First'.
Re: zlib compression of numeric values
by bulk88 (Priest) on Jul 17, 2012 at 05:47 UTC
    Null is
    my $justanull = "\x00";
    a little endian 4 byte integer is
    $not_ascii = pack('V', 1); #or $not_ascii = pack('V', "1"); #$not_ascii should not be printed to console, it looks gibberish as as +cii
    Perl has absolutely no problems in dealing with binary data aslong as you dont turn on utf8 mode for that scalar. & ^ and | work with "ascii" printable scalar integers, not on bytes or bits. After using &/^/|, you need to use pack to get it into binary. Perl historically is excellent in dealing with binary.

    If you really want, you can even write machine code/shellcode in Perl, and run it also from Perl. This guy did it, Pure Perl module(246 lines, Linux/Win32) that calls external libraries - no XS file..
Re: zlib compression of numeric values
by mbethke (Hermit) on Jul 17, 2012 at 05:30 UTC

    A couple of questions so I understand the problem:

    • Are you running out of disk/flash space or RAM? If it's the former, you could compress the entire files and when the required language has been determined, decompress one of them for use (what about a copmpressing file system?). But it sounds more like it's the latter so you want to compress individual strings.
    • Are you compiling the .po files into .mo to use them?
    • Why are you trying to compress single numbers? They won't compress anyway. If you really have to shove everything through zlib for some reason, you could use pack('I', $num) to turn it into a string, but don't do that. Maybe it's Pascal strings of length+string instead of lengthstring+'\0' you want to compress as a whole?
    • Why is there a terminating NUL that you can't include? C usually wants that, maybe the i18n library doesn't?

    Edit: fixed Pascal string description

Re: zlib compression of numeric values
by xiaoyafeng (Deacon) on Jul 17, 2012 at 05:33 UTC
    I ran the code you post and throw an error:
    No data given to hexdump. at C:/Perl/site/lib/Data/Hexdumper.pm line 2 +88.
    I've never used Compress::Raw::Zlib. but I think compress lib doesn't know what string or number is, it just treat all as binary data. So do you try to put number in to a file, binmode it and test it again? or just:
    $num_data = 7; open(my $fh, '<', \$num_data); binmode($fh); ....... .......
    HIH.




    I am trying to improve my English skills, if you see a mistake please feel free to reply or /msg me a correction

Re: zlib compression of numeric values
by pmqs (Friar) on Jul 17, 2012 at 12:38 UTC

    If you want a number output in binary format you need to pack it first, like this (assuming you want it stored as a 32-bit value)

    $num_data = pack("V", 7);

    Here is a quick proof of concept that compresses a series of strings, each prefixed by a length, then uncompresses them. No error handling is included. The compressed data is stored in a string ($outBuffer) in this example, but it can also work with files with a small modification to the code.

    use IO::Compress::Deflate qw(:all); use IO::Uncompress::Inflate qw(:all); sub put { my $handle = shift ; my $string = shift; print $handle pack("V", length $string) , $string ; } sub get { my $handle = shift ; my $buf ; read($handle, $buf, 4) == 4 or return undef ; my $len = unpack("V", $buf); read($handle, $buf, $len); return $buf; } my $outBuffer ; my $out = new IO::Compress::Deflate \$outBuffer; put($out, "hello world"); put($out, "goodbye"); $out->close ; my $in = new IO::Uncompress::Inflate \$outBuffer; my $got ; while (defined ($got = get($in))) { print "$got\n" ; }