Re: Taming a memory hog

I'm sure this story isn't exactly news to most of you, but it was a fun experience for me.

The most important thing is that you feel happy. I guess that's how lots of people learn things. You trust and treasure the stuffs you experienced, created, and resolved the most, more than anything you heard from others. Not to say the joy you have.

Can I suggest one experiment? Give Tie::File a try on the input file side, and see how it affects your system, and your experience.

In fact, I tried telling some of my coworkers why I was so happy with the app, but none of them seemed to think it was a big deal.

Maybe they will start with your third pass from the beginning, because of their experience. But what a big deal, when they do this for the first time in their life, they probably took a learning curve even longer than yours ;-)

Comment on Re: Taming a memory hog

Replies are listed 'Best First'.
(Guildenstern) Re: Re: Taming a memory hog by Guildenstern (Deacon) on Nov 10, 2003 at 20:20 UTC
I actually did have Tie::File in the mix at one point. I'm not sure if I was using it wrong, but my run times increased by several times. (I had to cancel the 100,000 run after waiting 2 hours - about 20 times longer than normal.) Maybe when I get some time I'll have to reinvestigate. Guildenstern Negaterd character class uber alles!	[reply]
Re: (Guildenstern) Re: Re: Taming a memory hog by Roger (Parson) on Nov 11, 2003 at 02:59 UTC
100,000 run after waiting 2 hours - about 20 times longer than normal Ok, does that mean your normal speed for processing records is `120 min /20 = 6 min` for 100,000 records? It sounds to me like there might still be room for improvement if you want to impress your client once more. I haven't seen your data, but I am doing daily processing of 20,000,000 records within 10 mins. Anyway, I am building my records with a split, your record processing might be complex, and I am just too fussy. :-)	[reply] [d/l]
Re: Re: (Guildenstern) Re: Re: Taming a memory hog by codingchemist (Novice) on Nov 12, 2003 at 00:09 UTC
This is an interesting problem of dealing with large datasets. currently I am trying to work with files 20,000,000 lines long and am trying to sort them. Do you have any suggestions about sorting? there seems to be a lot of info out there on large datasets, but I haven't seen much on sorting, especially on datasets too large to hold in memory. Thanks	[reply]
Re: Re: Re: (Guildenstern) Re: Re: Taming a memory hog by Roger (Parson) on Nov 12, 2003 at 00:30 UTC
Re: Re: Re: Re: (Guildenstern) Re: Re: Taming a memory hog by codingchemist (Novice) on Nov 17, 2003 at 04:20 UTC
Some notes below your chosen depth have not been shown here
(Guildenstern) Re: Re: Taming a memory hog by Guildenstern (Deacon) on Nov 13, 2003 at 00:28 UTC
So, I tried to make my record creation more efficient. Currently, there's three levels of nested `foreach`, plus some extra logic for special locations in the record. I realized that all of this could be rewritten using a single `foreach` containing a `map`. Chopped the lines of code by over half to create each record for output to the file. Then I ran it. As guessed above, 6 minutes is a normal run for 100,000 records. With my nifty new changes, creating 100,000 records takes almost a full minute longer. I don't know if using a `map` within a `foreach` is a good idea, but something sure seems to slow it down. On a side note, creating every piece of data in a record includes a call to `rand`, which is probably a large factor in why generating records takes so long. Guildenstern Negated character class uber alles!	[reply] [d/l] [select]