Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Accessing SQLite databases within ZIP files

by AppleFritter (Vicar)
on Oct 01, 2017 at 11:22 UTC ( [id://1200461]=perlquestion: print w/replies, xml ) Need Help??

AppleFritter has asked for the wisdom of the Perl Monks concerning the following question:

Dearest life forms lurking in the Monastery!

I'm trying to process resource files produced by a third-party application. These resource files are actually ZIP files containing, among other things:

  1. an SQLite database;
  2. a bunch of binary blobs (stored as file entries in the ZIP archive, rather than as BLOBs in the SQLite DB); and
  3. a JSON file mapping resource identifiers used in the DB to the binaries' filenames.

I'd like to access all this data. I'd also like to do this in the easiest, DWIMiest, most natural manner possible.

The most straightforward way is of course to extract the ZIP file, and then use DBI, JSON::XS and whatever modules are appropriate to handle the binaries (images, sounds, videos etc). But I'd like to avoid this, if possible; I want to be able to point my script at the ZIP file without having to worry about disk space, clean-up, and all that.

There's a variety of modules on CPAN for transparently handling ZIP archives (in fact, IO::Uncompress::Unzip is in core). What I have not found is a way of accessing a database without extracting it to disk first. More precisely, what I'd like to do is either:

  • have DBD::SQLite read the DB directly from the ZIP file, using some kind of transparent intermediary layer; or
  • extract the DB into memory (i.e. a Perl scalar), and then have DBD::SQLite read that.

I only need to read the DB, BTW, not modify it, so any complications to do with putting modifications back into the ZIP can safely be ignored.

So, my question is: is this possible, using only existing CPAN modules? A cursory search didn't reveal anything useful.

  • Comment on Accessing SQLite databases within ZIP files

Replies are listed 'Best First'.
Re: Accessing SQLite databases within ZIP files
by haukex (Archbishop) on Oct 01, 2017 at 15:10 UTC
    have DBD::SQLite read the DB directly from the ZIP file, using some kind of transparent intermediary layer

    IO::Uncompress::Unzip does support a filehandle interface, but since SQLite is implemented in C, I doubt it will be able to make use of that. Since I assume that SQLite will be doing a lot of random access on the file, I doubt that any "transparent" layer would be particularly performant (unless it happens to decide to cache the entire file in memory). So I assume that it would instead be easier and probably better to just extract the database from the archive - whether to a file or to memory would be the next point.

    extract the DB into memory (i.e. a Perl scalar), and then have DBD::SQLite read that

    Unfortunately I don't see an API method available in DBD::SQLite to do that, although I may have missed something. See also Putting an SQLite DB file into memory, although that is about reading a file on disk into an in-memory database, not using existing memory - the SQLite API would need to support a way to pass a pointer to that memory to it. However, I also don't see any mention of that in the SQLite docs on that topic. Again, I may be missing something (it is Sunday after all ;-) ), but I think you'll have to extract the file to disk.

    worry about disk space, clean-up, and all that

    If those are your main concerns, I think you will find File::Temp's my $tmpdir = tempdir(CLEANUP=>1); very useful, I use it a lot and have hardly ever had problems with it not cleaning up after itself (Perl usually has to crash hard for that). Personally, I would extract the SQLite database to disk, and probably only extract the binary blobs on demand, or perhaps, if the tools you are using to process them can make use of it, transparently using IO::Uncompress::Unzip. The JSON file you could obviously extract into memory and decode via e.g. JSON::MaybeXS.

      but I think you'll have to extract the file to disk.

      Looks like I will, going by the answers so far. Oh well, I suppose there's worse! Thanks for the pointer re: File::Temp and tempdir's's CLEANUP flag, that sounds mighty useful.

      (FWIW, performance wouldn't have mattered much. Famous last words, I know, but I really just need to run one SELECT query against this DB in order to extract the information I need.)

        tempdir's CLEANUP flag

        Since elsewhere in this thread you wrote this needs to work on Windows, let me quote "DBD::SQLite And File::Temp" from the DBD::SQLite documentation:

        CLEANUP may not work unless a database is disconnected

        When you set CLEANUP option to true when you create a temporary directory with tempdir() or newdir(), you may have to disconnect databases explicitly before the temporary directory is gone (notably under MS Windows).

Re: Accessing SQLite databases within ZIP files
by roboticus (Chancellor) on Oct 01, 2017 at 14:57 UTC

    AppleFritter:

    Not a CPAN solution, but perhaps you can use fuse-zip to mount the ZIP file as a filesytem?

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      That's an excellent idea! I'd do this if I were on Linux, rather than a rather less Unix-y OS that shall remain nameless... back to the drawing board for now, but I'll keep in mind for the future, I think this really would be the most elegant solution.
Re: Accessing SQLite databases within ZIP files
by wjw (Priest) on Oct 01, 2017 at 15:24 UTC

    If I understand this, you want to extract only the SQLite file from a zip file which may contain any number of other file types, operate on the SQLite DB, and then discard the SQLite DB you operated on (but not from the zip file). One question I would ask is whether or not the SQLite files conform to some sort of naming convention such as a common file name extension. If so, then using something as simple as Archive::Zip to extract the SQLite DB, as it has a simple interface to grab hold of the names of the members and even some searching using membersMatching( { regex => $regex } ).

    I think that if disk space is an issue, then memory would be and issue also. Are the SQLite files that big? Being that you only have to extract the SQLite DB file and have complete control over where it goes, it should be pretty easy to identify, extract, operate on and then delete that single file. (But maybe I am missing the point?)

    ...the majority is always wrong, and always the last to know about it...

    A solution is nothing more than a clearly stated problem...

      I think that if disk space is an issue, then memory would be and issue also.

      You'd think so! But I actually usually have less disk space available than I have RAM. The old adage that there isn't a HD you can fill to the brim with random bric-a-brac in three weeks' time is still true.

      That said — the SQLite DBs I've encountered so far aren't that big, no; on the order of (tens of) megabytes, which wouldn't be an issue even for me. OTOH there's no a priori reason they couldn't be (much) larger.

      Being that you only have to extract the SQLite DB file and have complete control over where it goes, it should be pretty easy to identify, extract, operate on and then delete that single file. (But maybe I am missing the point?)

      No, you got it nailed. I was really only asking because I'm too lazy to put in even that much work!

      I think that if disk space is an issue, then memory would be and issue also.
      Very good point you found there. Can I have it?


      holli

      You can lead your users to water, but alas, you cannot drown them.
Re: Accessing SQLite databases within ZIP files
by Laurent_R (Canon) on Oct 01, 2017 at 15:36 UTC
    Hi AppleFritter,

    perhaps you could extract the SQLite database (and possibly the other files) to a ramdisk and use DBD::SQLite on it.

      That's a good idea, thanks. Unfortunately this'll also have to work on Windows, which I don't think has built-in support for that sort of thing — but yes, otherwise this would be a pretty nifty solution.
        have to work on Windows, which I don't think has built-in support for that sort of thing

        There are RAMDisks for Windows, even free ones. I use an old version of SoftPerfect RAM Disk (v3.4.7) that is freeware, no questions asked. It seens v4.x has become payware, but v3.4.x is still available on several sites.

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
        I am surely not a specialist on the possibilities on Windows. My workstation runs on Windows, but our servers and databases and so on are all *nix and VMS (and even AS/400). But I am fairly sure there are some ramdisk possibilities on Windows. It even existed 30 years ago at the time of DOS.
Re: Accessing SQLite databases within ZIP files
by Corion (Patriarch) on Oct 02, 2017 at 12:49 UTC

    Reading/writing compressed pages should be possible if you write an SQLite filesystem plugin that de/compresses each page that SQLite requests.

    A premade solution, if you have USD 2000 to spare would be CEROD by the makers of SQLite, which allows for compression and encryption.

    I suspect that the performance of a SQLite file compressed by a standard tool will be fairly bad as I assume that SQLite relies on seeking within the file while most de/compressors assume that a file will be read sequentially.

      Just $2000? That's spare change! (Though looking at their licensing terms etc., it's not a bad offer at all, if one has a need for this sort of thing.)

      As for my original problem, it's true that performance would likely suffer, though not if the DB was decompressed into memory and accessed there, then. That said I've just bitten the bullet and extracted the DB to a temporary file, which (all things considered) is the easiest and most portable solution.

Re: Accessing SQLite databases within ZIP files
by Corion (Patriarch) on Jan 24, 2018 at 08:40 UTC

    It seems that since SQLite 3.22 (just released), there is a zipfile extension that allows reading from ZIP-compressed SQLite databases reading and writing ZIP files as if they were tables.

    To use it you will either need to manually compile it and load it as a dynamic extension, or modify your build of DBD::SQLite so that the file is directly included and the extension is registered.

    Update: This does not treat a compressed SQLite-ZIP file identical to a plain SQLite file, unfortunately.

A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1200461]
Approved by Athanasius
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (4)
As of 2024-04-19 18:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found