in reply to Re^2: Reducing application footprint: large text files
in thread Reducing application footprint: large text files

Most of the comments I've seen have assumed this is data. But your phrasing of "run the PM's thru pack..." implies that this isn't data, per se, but an actual perl module (.pm that you're accessing via use Some::Module). If this is true, I am not sure that rolling your own is the best choice. You might want to clarify on the point. Is the file you're trying to read, which you called "the PM's", pure data, data in perl format, or data plus other perl code (such as functions, for loops, etc), or something else?

I don't know specifically of a CPAN module that allows loading of a compressed module, but it would surprise me if there wasn't one (a quick search for "perl compress module" finds perl modules that compress something else, not perl modules that allow you to compress your source code). Or something like the Acme::Buffy, which will modify the source code. I just don't know of what that module would be... but maybe my phrasing will spark something in a more experienced monk

I hesitate to recommend Module::Crypt: I hesitate, because Module::Crypt doesn't really do what the name implies: never rely on Module::Crypt to protect your source code from prying eyes; it will not keep it secret! But I mention it nonetheless because I think that maybe the XS output from Module::Crypt would be smaller than your 10MB++ perl module. I don't know if it would be, but it might be something to try.

Replies are listed 'Best First'.
Re^4: Reducing application footprint: large text files
by salva (Canon) on Mar 01, 2018 at 10:09 UTC
    You can use Filter::gunzip to use a compressed module.

    Just compress the module using gzip and then prepend the header telling perl to use the filter:

    gzip -9 Module.pm echo "use Filter::gunzip;" >Module.pm cat Module.pm.gz >>Module.pm
Re^4: Reducing application footprint: large text files
by LanX (Saint) on Feb 28, 2018 at 23:30 UTC
Re^4: Reducing application footprint: large text files
by Anonymous Monk on Mar 01, 2018 at 00:44 UTC
    Yes pryrt, you are correct; the 'data' is all contained in a *massive* PM that is computer-generated.

    The PM only contains many stanzas of 'data' in the format I described in the OP. That is, the module contains no code that operates on it. The 'operational' code is contained in other modules that pull in this *massive* PM.

    Regarding compression, that is another option. I was hoping to avoid it, because of the overhead associated with decompressing, storing the result, and then operating on it.

    I will certainly be keeping your suggestions in mind as I move forward with this. They are much appreciated.

    Matt.

      I would like to revisit the SQlite idea.

      Can you explain why that is "not an option"? What is "wrong" with that idea?
      The idea of 10's of MB of data within a Perl PM module seems really wrong to me.
      That there aren't any accessor methods for this data contained within this .PM module sounds doubly wrong to me.
      It sounds like this should be a binary data file or an SQLite DB instead of a PM module.

      At least on the Perl distribution that I use, there are no extra modules to install in order to write SQLite code. use DBI; is sufficient. SQlite can import maybe about 50K+ records per second. However in this situation, sounds like there is nothing to import. An SQLite DB can serve the purpose of this humongous computer generated .PM file. Why not have Perl generate the SQLite DB as part of the build process? You can write accessor methods to a new Perl module that is essentially the I/F to this SQLite DB.

      I will point out that it is possible to dynamically expand or contract the amount of memory that SQLite uses for its work. I doubt that you will need to do that, but be aware that this is possible. When running complex indexing operations on "large" DB's, I have used this feature to speed things up. But that is unusual and a "weird" thing to do.

      As an update: When you have a Perl statement like: ADDRESS => 0x400000, even the "numeric" value, 0x400000 is stored as ASCII text until you use it in a numeric context. How Perl stores things can get very complicated. perlguts might open your eyes. I think you have short-changed the idea of SQLite. Keep in mind that every "smart phone" on the planet has an SQLite DB.

      Simple Demo:

      #!/usr/bin/perl use strict; use warnings; my $zero ="00000000000000000"; print "$zero\n"; # 00000000000000000 # exact text $zero = $zero + 0; print "$zero\n"; # 0