Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re^2: Reduce CPU utilization time in reading file using perl

by madtoperl (Hermit)
on Sep 28, 2013 at 13:18 UTC ( [id://1056131]=note: print w/replies, xml ) Need Help??


in reply to Re: Reduce CPU utilization time in reading file using perl
in thread Reduce CPU utilization time in reading file using perl

Hi BrowserUK,
Thanks a lot for your inputs. I have tried your option as well, sitll the CPU usage is 100%.Is it possible to load only one line into memory from the huge file and store it into hash or without opening directly possible to store it into hash. I am worrying that it may not be very less possible. Still thought of getting your suggestion.
Thanks
madtoperl
  • Comment on Re^2: Reduce CPU utilization time in reading file using perl

Replies are listed 'Best First'.
Re^3: Reduce CPU utilization time in reading file using perl
by BrowserUk (Patriarch) on Sep 28, 2013 at 13:31 UTC
    I have tried your option as well, sitll the CPU usage is 100%.

    That is because you are using more memory for the hash than you have installed, thus, some parts of the memory holding the hash are being swapped or paged to disk as the file is being read. The nature of the way hashes are stored means that pages of memory are constantly being written to disk and then re-read, over and over; and that is what is driving up your cpu usage.

    Is it possible to load only one line into memory from the huge file and store it into hash or without opening directly possible to store it into hash.

    That is what my code does. Read's one line installs into the hash then reads the next. It is the size of the hash that is the problem, not the line-by-line processing of the file.

    I am worrying that it may not be very less possible. Still thought of getting your suggestion.

    There are various ways of providing access to huge amount of data without requiring that it all be held in memory concurrently. Which of those methods/mechanisms is appropriate for your purpose depends entirely upon what you need to do with that data.

    So, before advising further, you need to answer the question: Why are you attempting to load all the data into a hash?


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
    p
      Hi BrowserUk
      Thanks a lot for the help.I have two huge files and need to compare the difference between those two files by column wise and write the difference of the corresponding column and row where the mismatch is into a third file. The lines of both the files are delimited using |.Could you please suggest the better option for this. Right now, I am loading the two files data into two separate hash and compare it and write it into the third file. It would be ood if you can suggest something other than loading the file content into database and fetching it.
      Thanks,
      madtoperl
        It would be ood if you can suggest something other than loading the file content into database and fetching it.

        Probably, but not based on the information you've provided so far. Why do you seem reluctant to provide information?

        Please provide:

        • The size in bytes of both files.
        • The number of records in both files.
        • The number of fields in the lines of both files.
        • A couple of sample records from both files.

          If the data is proprietary, then take a couple of sample records and change the identifying words, numbers etc., but try to ensure that they remain realistic.

        • And idea of how often you will need to do this and how often the file(s) change.

          Ie. "The bigger file remains constant and the smaller changes one a week";

          Or: "This is a one-off problem never to be repeated".

          Or: "The two files never change, but the (combination of) fields used for comparison changes every day".

          Or: ...

        With that information, we here have a realistic chance of understanding the scale of the problem and possible solutions.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1056131]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (2)
As of 2024-04-16 23:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found