I won't get into using a proper parser, aside from saying that anything less than a well-tested parsing module could become difficult to maintain if the data set starts throwing formats that you're not anticipating.

Getting to your question, the first part is basically asking to determine if file1's host ID's are a subset of file2's. Tackle that question first. Pull your ID's from file2 into a hash, where the ID is the hash key. The value for each hash key should be a reference to an anonymous hash containing only the alias's that start with 'www.' For example:

$file2{billsite} = { 'www.billsouthersite.com' => '', 'www.billsite.com' => '', };

You don't really need the 2nd level hash to have values; you're only interested in the keys for quick lookups. Once you've pulled file2 into a hash of this nature, the next step is to iterate through file1. when you process one ID, you'll check to see if it's in %file2. If not, fail. Next you'll process each host-alias that fall under the ID you're processing in file1. As you do so, keep a count so that you can be sure that the quantity matches the number of keys of the 2nd level of the HoH %file2. For each host-alias in file1, check your HoH %file2 to see if that key exists under your current ID. If the key exists, and your keycount matches your file1 host-alias count for that ID, that record passes. If at any point there is a mismatch (not enough keys host-aliases, or a host-alias from file1 not found in file2, you can last or die out of your loop and fail without continuing to test.

Once you visualize the datastructure the rest should come easy.


Dave


In reply to Re: Comparing Two Files by davido
in thread Comparing Two Files by walkingthecow

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.