in reply to Re: Force perl to release memory back to the operating system
in thread Force perl to release memory back to the operating system

Hi Tachyon, thanks for your suggestion with the database approach! And yes I have access to Sybase database.

Unfortunately I cann't load them into the database. All these files come from who-knows-what systems, and I have no control over their creations (and I don't want to know either). And I specifically avoided to load them into database because I was told not to.

Sorry I forgot to mention that I am only allowed to work on the flat files, and the extraction is run only once per day. Why can't load these data into database in the first place? I don't know. That's the probably an old business decision.
  • Comment on Re: Re: Force perl to release memory back to the operating system

Replies are listed 'Best First'.
Re: Re: Re: Force perl to release memory back to the operating system
by mirod (Canon) on Sep 25, 2003 at 07:18 UTC

    Are you not allowed to use any DB, or is the decision just to not allow you to use the company Sybase, so you don't bother the DBA? If you use the DBI with DBD::SQLite there will be no need for a DB server (it's included in the module) and the whole DB will be a single file. This removes any administration, the DBA is happy, your boss should be happy too, and you gain scalability and the convenience of using SQL to work on your data.

    SQLite is really pretty fast, and ideally suited for this kind of single-user application. Just test whether the DB file doesn't grow past your OS limit if there is one and you should be fine.

Re: Re: Re: Force perl to release memory back to the operating system
by tachyon (Chancellor) on Sep 25, 2003 at 07:03 UTC

    It might be time to rethink that business decision. Anyway you can save some memory if you think about what you are doing.

    # get the customer => blance hash mapping - if you can't RDBMS you are + stuck with this $cust_bal = ...... # now all we are interested in is the top 30 # we don't need to sort them all using a Schwartzian # we only need to sort the values. This saves us a lot of memory as # [ cust, bal ] will take up a lot more space than just bal # we can also exit at this point and write a new temp flat file # with all the cust => bal mappings and this reclaims the memory. # either way we just want those top 30 so just sort the balances: my @bals = sort { $b <=> $a } values %$cust_bal # we are only interested in the top 30 which is a balance greater than +.... my $gt = $bal[30] # so now we iterate over our hash again (or the flat file) my $top_30; for my $cust( keys %$cust_bal ) { next unless $cust_bal > $gt; $top_30->{$cust} = $cust_bal->{$cust}; } # now the top_30 are in a hash ready to be sorted for mapping to cust +details and output

    This will save you roughly 25% and up to 50% (by using a temp file) of the memory that the original algorithm used which might get you home. If not you will have to tie your hash to a file. dbmopen might be the DB you have when you are not having a DB :-) If you can't do it in memory and have to tie to the disk it will be really slow, but I gues you have got all day :-)

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

      Hi brother Tachyon, thanks for your suggestion! It has openned up my mind. I have never thought about the memory savings by using a straight sort. Too bad I am back home already, I can't wait to get back to office tomorrow morning to try it out.

      Thanks again for your prompt reply! I really appreciate it. I will let you how I go tomorrow!
      I don't think this is solving anything, because (as I understand the OP) the files are of transactions, not customers and balances.

      If each entry in the file was a customer and a balance, one run through would be sufficient, just keeping the top 30 customers.

      It's more complicated than that, and a nightmare without a RDB.

        You can that there are multiple transactions per customer with the bit of code that has:

        $hash_ref->{$cust} += balance * $fudge_factor

        Agree PHB needs shooting. Saying you can't use a RDBMS when it is suited to the job is like saying OK David, I'm sorry but you can't use one of the surplus RPGs we are tripping over, but can you go an kill that really big Goliath guy over there with this slingshot..... Oh and we are out of pebbles too. But don't worry I have faith that you will get this done on time and under budget.....

        cheers

        tachyon

        s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Re: Re: Re: Force perl to release memory back to the operating system
by zengargoyle (Deacon) on Sep 25, 2003 at 07:24 UTC

    i vote for the database as well. even if you have to install mysql yourself. but if you must you can try something like this.

    #!/usr/bin/perl # # foo - called by bar. recieves files to search on it's # input, calculates top N, retrieves the details for # the top N and prints them to STDOUT in some easily # parseable format # use strict; use warnings; $|++; my @files = <>; chomp @files; for (@files) { # process files warn "processing $_$/"; } my @top = ( [ a => 10 ], # get top N [ b => 5 ], [ c => 3 ], ); my %detail = ( # and detail info a => [ 1, 1, 4, 2, 1, 1 ], b => [ 1, 2, 2 ], c => [ 1, 2 ], ); for (@top) { printf "%s %d$/", @$_; printf "@{$detail{$_->[0]}}$/"; } exit; #!/usr/bin/perl # # bar - calls foo as a child process and colects tidbits # of data for future use. foo's memory will go back # to the system soon after the answers are read. # use strict; use warnings; use IPC::Open2; my @files = qw( xxx xxy xxz ); my ($read, $write, $child); $child = open2($read, $write, './foo'); print $write $_,$/ for @files; close $write; my @info = <$read>; close $read; waitpid $child, 0; print @info;