in reply to Re: Force perl to release memory back to the operating system
in thread Force perl to release memory back to the operating system

In fact 20 million lines of data in flat files is a very solid indication it is time to put the data into a RDBMS.

On the contrary. Not putting them in a DB can make things _much_ more efficient. The real deciding factor is not the data load size, but the data access requirements and volatility of the data.

For instance I receive about 2 gigs worth of records every day that I have to process. Never in their life do these records see a DB. Just loading them into a DB and indexing them is signifigantly slower than the processing I need to do. And I only need to do that processing once (or very rarely twice). RDBMS are not IMO suitable for managing large volumes of data that are only going to be accessed once or twice, never are changed, and can be disposed of once processed.

Anyway, just thought Id throw that in there.


---
demerphq

    First they ignore you, then they laugh at you, then they fight you, then you win.
    -- Gandhi


Replies are listed 'Best First'.
Re: Re: Re: Force perl to release memory back to the operating system
by tachyon (Chancellor) on Sep 25, 2003 at 11:52 UTC

    Yes I did make a couple of assumptions about what look like finacial transactions in multiple currencies that are cumulated and then indexed against against customer details to get the most active customer list by transaction value. I can't think why that looked like a DB task :-)

    But of course you are right. I won't be putting my web or squid logs into a DB anytime soon although I do rotate them daily and use YAML to serialize data we parse out into flat files so we can get it back as required (rarely and for static HTML anyway)

    As always tools and tasks.....

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

      Yes I did make a couple of assumptions

      Er, I hope you didnt misunderstand me. My reply wasnt to your whole node or the analysis you offered, just more of a heads up to people that DB's are not the universal pancea for handling large volumes of data. (Not so much for you, as Im familiar with you and was confident you knew the caveats, but for other monks out there that might be tempted to bulk load a few million records just to sum one field in them.)

      I find theres a tendency amongst DB'ish types to treat DBMS as the only way to solve problems. There are huge classes of problems where using a DBMS is a no brainer (transactions, distribution, integrity, locking etc), but there are also huge classes of problems where a DB is probably not the best solution.

      As you said. Tools for tasks...

      Anyway, im hoping this exchange opens one or two minds to alternate approaches which can only be a good thing.

      Cheers, :-)


      ---
      demerphq

        First they ignore you, then they laugh at you, then they fight you, then you win.
        -- Gandhi


Re: Re: Re: Force perl to release memory back to the operating system
by poqui (Deacon) on Sep 25, 2003 at 15:13 UTC
    I agree. I have done match/merge processing with large lists much faster in flat files with unix commands than loading them into Oracle, for example, and trying to do the processing there. The data was very thin, less than 10 columns, but fairly large, about 1M records.