in reply to Re: Use temporary file or cache
in thread Use temporary file or cache

How big is too large? My largest xml file is about 20 mbs... I'm a little unclear on how to programatically save everything to the hash and then look it up... I haven't worked with hashes that much in Perl yet. So, the first time I loop through the xml to find the comments, I store each comment as a new entry in the %comments array, right? Something like this?
... my $commentText = $date . $author . $content; my %comments = ( $posturl => [$commentText] ) ...

I can see that working for one comment--would a second comment then overwrite the first in the array or would it add automatically? Would I need to use push or something?

Then, during the second pass, how exactly would I reference the keyed array?

Sorry for all the newbie questions... Thanks!

Replies are listed 'Best First'.
Re^3: Use temporary file or cache
by moritz (Cardinal) on May 28, 2009 at 09:25 UTC
    Try to read perlintro or perldata for more information on hashes, and perlreftut for more involved data structures. Suppose you have a comment stored in $comment, and want to store that it's related to the url $post_url, you'd write:
    my %comments; ... push @{$comments{$post_url}}, $comment;

    And you can retrieve and iterate over the list of comments to an URL:

    for my $c (@{$comments{$post_url}}) { print "$c\n"; }
Re^3: Use temporary file or cache
by ig (Vicar) on May 28, 2009 at 12:51 UTC

    20mbs, even with a few copies in hashes and arrays here and there, probably fits in memory without any problem. There is some overhead in perl variables and more in the more complex structures (hashes and arrays) but even so you should be OK unless you have very limited virtual memory available. I would keep everything in memory (from perl's perspective) and let the swapper deal with disk if necessary, unless this proved to be problematic. You might have a look to see how much free RAM is available on your system.