So what do I do? How can you compare two files for changes when both files are so large that you either run out of memory or it takes so long to process them that the information in the file is out of date.

I would consider other options. Here's one that might work for you: Use separate logical databases for each customer, and use UNION queries that span the separate logical databases. (MySQL 4.0 supports UNION queries). This looks like:

SELECT stuff FROM db1.t WHERE stuff like 'foo%' UNION SELECT stuff FROM db2.t WHERE stuff like 'foo%' UNION SELECT stuff FROM db3.t WHERE stuff like 'foo%'

In such a scheme, you would build (and possibly cache) the query at runtime based on the current "complete" databases. When new data for a vendor arrived, you would create a new database, bulk load the new data into that database (without having to worry about deleting expired product records), then switch that database to be current. Then, arrange for new queries to use the new database. Since there may be queries active at the time of the switch, you may need to introduce some delay before recycling (deleting) the old database for that vendor.

The beauty of this scheme is that

The (small) downside is that you can't embed a static query (or set of queries) in your applications. Instead, you have to either construct new queries dynamically, or read the cached ones.


In reply to Re: How to process two files of over a million lines for changes by dws
in thread How to process two files of over a million lines for changes by richard5mith

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.