in reply to Compact data classes

creeble:

I can think of a few possibilities, but I don't know if any would be good because i don't know how you intend to access the values, how fast you need to look them up, etc.

For example, you could concatenate all your strings into one long one using the scheme of having a byte containing the length of the field and the text. (You did mention that each object totals about 150 bytes of data, if I understand correctl.) Then your objects would store only the offset of the first field. Each access function would simply need to skip the appropriate number of fields.

$ cat t.pl #!/usr/bin/perl use strict; use warnings; my $joe = kablooie::new('Joe', 'Blow'); my $jane = kablooie::new('Jane', 'Smith'); my $name = $joe->fname(); print "joe's first name: $name\n"; $name = $jane->lname(); print "jane's last name: $name\n"; package kablooie; my $kablooie_storage; BEGIN { $kablooie_storage = "Bob's your uncle"; } sub new { my ($fname,$lname) = @_; my $offset = length($kablooie_storage); $kablooie_storage .= chr(length($fname)) . $fname; $kablooie_storage .= chr(length($lname)) . $lname; return bless \$offset, 'kablooie'; } sub fname { # FName is the first field my $self = shift; my $offset = $$self; my $len = ord(substr($kablooie_storage,$offset,1)); return substr($kablooie_storage,$offset+1,$len); } sub lname { # LName is the second field my $self = shift; my $offset = $$self; my $len = ord(substr($kablooie_storage,$offset,1)); $offset+=$len+1; $len = ord(substr($kablooie_storage,$offset,1)); return substr($kablooie_storage,$offset+1,$len); }

When run, I get:

$ perl t.pl joe's first name: Joe jane's last name: Smith

You could add some simple compression scheme, too. (RLE might be useful if you have lots of repeated characters, and if you don't mind restricting your character set, you could compress the characters into 6 bits each.)

That said, I have no idea how much space you could save using this technique, and whether it would be worth it or not.

...roboticus

When your only tool is a hammer, all problems look like your thumb.

Replies are listed 'Best First'.
Re^2: Compact data classes
by creeble (Sexton) on Jun 08, 2013 at 02:28 UTC
    I tried a variation of that using pack and unpack, but it was unworkably slow. I do need quick access to all the fields. It did save a fair amount of memory, but less than I would have hoped.

    What I'm starting to envision is a somewhat-specialized XS module that builds something similar (I could afford pointers to the individual field strings) in C, and then allows access to it via XS. The db is read-only once it's sorted, so updates aren't an issue. You could malloc large blocks and just write null-terminated strings to them, updating the pointers for the fields.

    But I know almost nothing about XS and whether the conversion from a string in C to what perl thinks is a string would be painfully slow. I would guess not, since it must do it internally all the time?