comment on

I am using Cygwin Perl on Windows XP and am wondering what my options are for handling this problem.

Basically I read in the contents of a very large file as an array and then bucket-hash it using the Data::Bucket index() method:


=head2 index

 Usage     : my $bucket = Data::Bucket->index(data => strings, %other)
+;
 Purpose   : Build a data structure with @strings partitioned into buc
+kets
 Returns   : An object.
 Argument  : A list of data compatible with the compute_index() functi
+on

=cut

sub index
{
    my ($class, %parm) = @_;

    exists $parm{data} or die "Data must be passed for indexing" ;
    ref $parm{data} eq 'ARRAY' or die 'You must pass an array ref';

    my $self = bless (\%parm, ref ($class) || $class);

    $self->bucket_hash;

    return $self;
}

=head2 bucket_hash

 Usage     : Called internally by index()
 Purpose   : Partition $self->{data} by repeated calls 
   to $self->compute_record_index
 Returns   : Nothing
 Argument  : None.

=cut

sub bucket_hash
{
    my ($self) = @_;

    for my $data (@{$self->{data}}) {
    my $index = $self->compute_record_index($data);

    my @index = ref $index eq 'ARRAY' ? @$index : ($index) ;
    for (@index) {
        push @{ $self->{bucket}{$_} } , $data ;
    }
    }

    return $self;
}
[download]

However, the call to index led to an out of memory error and I am considering various approaches to fixing this issue:

rewrite the bucket_hash method used by index() so that it writes to a SQLite database. This approach allows for in-memory or on-disk databases as needed. I'm still kicking myself for not doing this in the first place, but I figured I would never run out of memory.
Some sort of method of tie a perl hashref to a disk. I think there are some modules for saving them to sleepycat db files or something... any ideas here? This would save me from rewriting my code.
Increasing the virtual memory on my machine... perhaps I can just jack up the virtual memory ... but would Cygwin Perl know how to take advantage of such memory or can my hashref only occupy main physical RAM?

I have beheld the tarball of 22.1 on ftp.gnu.org with my own eyes. How can you say that there is no God in the Church of Emacs? -- David Kastrup	`[tag://cpan-bucket-hash,memory,sqlite]` [download]
Enforce strict model-view separation in template engines via HTML::Seamstress	The car is in the cdr, not the cdr in the car

In reply to hashref population yields out of memory error by metaperl

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.