Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: Bit-vector compression

by Fletch (Bishop)
on Mar 01, 2022 at 23:46 UTC ( [id://11141742]=note: print w/replies, xml ) Need Help??


in reply to Bit-vector compression

Question vague so just random thoughts but . . .

  • It's possible to use (say) Compress::Bzip2 to compress and uncompress strings in memory. You could certainly store a file of compressed data "lines" and read from that but the problem becomes knowing what's a "line" to read and then you're starting to need to implement your own file format with an index and . . . bleh. But not too bad (maybe inspired by the zip format have a TOC/index block at the end of the file which you read which provides a mapping of file offset (that you could pass to seek) and the number of bytes for that particular "line").
  • Problem with that approach though is that by splitting things into separate lines compressed separately you're going to "lose" some of the redundancy that gzip was able to bite on.
  • You haven't said why you need all these bits but something made me think of Bloom Filters; that may or may not be anywhere relevant. If you go a bit more into what you're trying to do with all of these you might get some more useful prior art pointers.

Update: Actually after thinking it over you might just could use Archive::Zip and let it store each of your "lines" as a separate file in the archive. That'd probably get you most of the way there without needing to reimplement any wheels. If you're truly worried about runtime space implement some sort of LRU cache on top of that and only keep the last n actively accessed "lines" live in memory.

Another thought depending on the sparseness of your bits using Set::IntSpan and a textual representation could have possibilities. But still . . . much handwavium here.

The cake is a lie.
The cake is a lie.
The cake is a lie.

Replies are listed 'Best First'.
Re^2: Bit-vector compression
by eyepopslikeamosquito (Archbishop) on Mar 02, 2022 at 00:29 UTC

    I'd add Judy arrays to your excellent list of fun/desperate things to try. :)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11141742]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (3)
As of 2024-04-20 14:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found