Yeah, agreed on the database-can-read-CSV issue. That eliminates this overhead.

But then, the code example of tybalt98 (I had prepared something very similar to run benchmarks) doesn't swap, regardless of how big the dataset is. Time is more or less linear with the number of records. My (not very up-to-date) system processes about 20000 records per minute, which means I wouldn't stand a chance to process 14M records in four hours. NYTProf shows that most of the time goes into preparation and printing the output file. It doesn't even help a lot if output goes to SSD.

I wonder what indexing you would apply to the problem at hand? If you can provide an example, I'd be happy to run it against my SQLite or postgres server on the same system for comparison. I don't mind working with databases at all (how could I: I've been working as a product manager for database engines for some years). But in this case the suggestions to use a database (or MCE) all came with little concrete help for the OP and his program. tybalt98 and I found an actual performance issue which, when fixed, gives several orders of magnitude acceleration. How much gain do you expect from switching to a database?

How much familiarity with SQL and database functions do the database aficionados expect from the OP? Is this actually helping or is this saying "look how smart I am!"?

Also, when your management likes the output you just produced, they're going to ask for more and more analytics.
I can confirm that from my own experience. But then, management doesn't ask for a 260GB CSV file, they usually want "two or three slides". One of my most successful Perl programs fell into that category. The evaluation ran once per week for several years. It might have been using a database but it didn't. Actually, no one cared.

In reply to Re^4: efficient perl code to count, rank by haj
in thread efficient perl code to count, rank by Perl_Noob2021

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.