Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: Reducing Memory Usage

by mhi (Friar)
on Jul 16, 2004 at 09:21 UTC ( [id://374952]=note: print w/replies, xml ) Need Help??


in reply to Reducing Memory Usage

Since you say that the file size has increased from 5 to 125MB, I'll just guess it won't stop there... So, yes, a Database would be the way to go.

If that is not feasible, you might want to create a sort-file from your original data that consists of the sorting criteria in a directly (ascii-)sortable fixed-length format starting at the beginning of the line and the original data afterwards, separated by a delimiter.
This file can then be sorted by any simple sort program. (if you're on a unix box or have cygwin available, 'sort' should do the job easily and you can tweak the buffer size it uses for optimum performance on your box. After all, sorting files is exactly what it was written for!)
After sorting, just filter out the sorting info and the delimiter again and you have your sorted data.

Replies are listed 'Best First'.
Re^2: Reducing Memory Usage
by PerlingTheUK (Hermit) on Jul 16, 2004 at 09:41 UTC
    That sounds interesting, but I believe before starting that I will definitely go the database way.
    The size is likely to come to an end at 150 to 175 MByte.
    Thank You all for you answers. Anyway I am aware that Perl likes to be slightly !thriftless! when it comes to memory usage. Nevertheless would I like to know if there are any techniques known in Perl to reduce memory usage, (apart from those helping to avoid memory leaks). Does anyone around know any links, documentation, books about this and closely related problems?
      Your selected algorithm is the best way to control Perl's memory usage.

      First, I might suggest that you decode the "wierd" date in your file ONE time, by going through the large file once, and rewriting it to a new file with the "proper" date.

      Second, if your Perl program is just a sorting thing, (or that is at least a major function of it), then if it's a big enough problem, purchasing a dedicated specialized sort program for your OS might be a better investment. Syncsort is such a product that may fit your needs. There are versions for Windows and for most important flavors of UNIX.

        From what I can tell, his "sorting" requirement is not exactly sorting, more ordering, with dynamically changing ordering dependant upon time of day. Which makes using external sorts or even perl sort rather difficult.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
        "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon
Re^2: Reducing Memory Usage
by danielcid (Scribe) on Jul 16, 2004 at 13:17 UTC

    I completely agree with you (mhi). Imagine a few months
    later, you loading a 200,300 or 400 MB file in the
    memory... It's crazy!
    There is so many free databases, like mysql. You should
    think carefully about it.

    -DBC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://374952]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (5)
As of 2024-04-18 02:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found