If the lines in the files have a fixed order, it's easy - you never need more than 2 lines in memory. Assume the file consists of two columns, product name and price, and they are ordered on the product name. Pseudo-algorithm:
  1. Read product name (pn.o) and price (p.o) from the old file. Read product name (pn.n) and price (p.n) from the new file.
  2. If pn.o eq pn.n, goto 5.
  3. If pn.o lt pn.n, then pn.o was deleted. If the old file is exhausted, goto 8, else read the next line of the old file into pn.o and p.o and goto 2.
  4. (pn.o gt pn.n) This means pn.n is a new product. If the new file is exhausted, goto 9, else read the next line of the new file into pn.n and p.n and goto 2.
  5. If p.o != p.n, the price was modified. Else there was no change in the product.
  6. If the old file is exhausted, goto 8, else read the next line of the old file into pn.o and p.o.
  7. If the new file is exhausted, goto 9, else read the next line of the new file into pn.n and p.n and goto 2.
  8. pn.n is a new product, and so are all other unread entries in the new file. Read them, adjust your database, and end the program.
  9. pn.o is a deleted product, and all other unread entries in the old file were deleted as well. Read them, adjust your database, and end the program.
Now, if the entries aren't sorted, you may be able to sort them using the sort program - it shouldn't have any difficulties sorting a few million lines.

Abigail


In reply to Re: How to process two files of over a million lines for changes by Abigail-II
in thread How to process two files of over a million lines for changes by richard5mith

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.