Recently I rediscovered the similar...

Before you spend a lot of time and effort trying to use a module, make sure it matters.

I've noticed I tend to over perlify/generify/engineer solutions since I've started to use Perl more frequently.

During a recent project I ended up needing to repeatably check the uniqueness of about 7.8 million strings of about 15 characters each (stored in a flat file). My first solution, which I rejected after a quick test, was to just populate a hash and make sure I never added an element that already existed. Goodbye swap & CPU.

The solution seemed to allow using Set::IntSpan, so I installed it and tried it out. Unfortunately, the sets were quite large and insertion time didn't scale (it took several hours to complete). Next I thought I just needed to use some kind of "database". So I tried out one or more of the "builtin" in ones (e.g. SDBM or something similar). Same problem.

I started thinking about getting more serious and installing SQL or a working version of DB_File or one of the "Sort" modules for sorting largefiles. Chances are good one of these would work, but I was running out of time and needed to get this working.

I realized that I needed to take a step backward and think about how I would do this without perl. I ended up just using the standard unix 'sort' utility and then having perl just run through the result and check to make sure no two adjacent lines matched. This seems fairly low-tech, but it works in less then 10 minutes with a resonable amount of memory rather than several hours and/or lots of memory (and lots of installation/debugging/etc).

The project is now over so I don't need something portable. Even if I did, I would probably do the same thing and leave the "optimization" (i.e. a supposedly cleaner solution) until after I had tested this further.


In reply to Which way is "better" by bluto
in thread Which way is faster? by dws

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.