If the lines in the files have a fixed order, it's easy - you never need more than 2 lines in memory. Assume the file consists of two columns, product name and price, and they are ordered on the product name. Pseudo-algorithm:
- Read product name (pn.o) and price (p.o) from the old file. Read product name (pn.n) and price (p.n) from the new file.
- If pn.o eq pn.n, goto 5.
- If pn.o lt pn.n, then pn.o was deleted. If the old file is exhausted, goto 8, else read the next line of the old file into pn.o and p.o and goto 2.
- (pn.o gt pn.n) This means pn.n is a new product. If the new file is exhausted, goto 9, else read the next line of the new file into pn.n and p.n and goto 2.
- If p.o != p.n, the price was modified. Else there was no change in the product.
- If the old file is exhausted, goto 8, else read the next line of the old file into pn.o and p.o.
- If the new file is exhausted, goto 9, else read the next line of the new file into pn.n and p.n and goto 2.
- pn.n is a new product, and so are all other unread entries in the new file. Read them, adjust your database, and end the program.
- pn.o is a deleted product, and all other unread entries in the old file were deleted as well. Read them, adjust your database, and end the program.
Now, if the entries aren't sorted, you may be able to sort them using the
sort program - it shouldn't have any difficulties sorting a few million lines.
Abigail