Macabre Monks,

While I sit here waiting for my script to run, I wonder: out there in the rest of the Perl industry, what's considered a large data processing job?

My case today: I have a 650Mb flat text file with 15-million lines of 1's and 0's. I have another flat text file with 600 lines, each line with 7 positions containing 1's and 0's. Each character and position of every line of the first file has to be checked against each character and position of the second file. That is, 15-million lines of text to parse into characters, then 15-million * 600 * 7 = 6.3-billion comparisons to make, and write matches (normally around 100-million) to about 40-thousand "match position list" files on disk.

There are no patterns to look for. I have been unable to find a better method than brute force character-wise comparison. It takes about 8 hours to run.

And I've sometimes wondered - is this outrageous, or just everyday at work for some?

Thanks




Forget that fear of gravity,
Get a little savagery in your life.
Max Webster

In reply to What is a "big job" in the industry? by punch_card_don

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.