Re: Comparing Two Files

I won't get into using a proper parser, aside from saying that anything less than a well-tested parsing module could become difficult to maintain if the data set starts throwing formats that you're not anticipating.

Getting to your question, the first part is basically asking to determine if file1's host ID's are a subset of file2's. Tackle that question first. Pull your ID's from file2 into a hash, where the ID is the hash key. The value for each hash key should be a reference to an anonymous hash containing only the alias's that start with 'www.' For example:

$file2{billsite} = {
    'www.billsouthersite.com' => '',
    'www.billsite.com'        => '',
};
[download]

You don't really need the 2nd level hash to have values; you're only interested in the keys for quick lookups. Once you've pulled file2 into a hash of this nature, the next step is to iterate through file1. when you process one ID, you'll check to see if it's in %file2. If not, fail. Next you'll process each host-alias that fall under the ID you're processing in file1. As you do so, keep a count so that you can be sure that the quantity matches the number of keys of the 2nd level of the HoH %file2. For each host-alias in file1, check your HoH %file2 to see if that key exists under your current ID. If the key exists, and your keycount matches your file1 host-alias count for that ID, that record passes. If at any point there is a mismatch (not enough keys host-aliases, or a host-alias from file1 not found in file2, you can last or die out of your loop and fail without continuing to test.

Once you visualize the datastructure the rest should come easy.

Dave

Comment on Re: Comparing Two Files Download Code