Hi, Perl Masters,

Okay, I am resubmitting a question I posted yesterday. (I rushed to get yesterday's posting out and it looked awful--sorry about that!) Here is what I need to do:-

1) Two files exist, and each one includes lines formatted like this:

word1 word2 word3 word4 num1 num2 num3 word5 word6 ...

Note: the first 4 words and 3 nums will always exist, but the last string of words (word5 word6, etc.) changes from one line to the next--sometimes it's a list that goes up to word30, sometimes it only goes to (minimum) word6.

2) I need to grab the first 4 entries of each line in file1 and find their match in file2 (e.g. "word1 word2 word3 word4" eq "word1 word2 word3 word4"). If there is a match, print each line from each file!

3) Once a match is made, jump to word5 on the same line and check to see if the string of words at the end (e.g. after num3) is equal. (I don't care about the number of words after num3--if there are 20 words in both files, the 20 words must match and be in the same order.) If unequal, print it!

I have included the fledgling beginnings of some code I have. If someone could recommend what to put in the commented areas, that would be GREAT!

NOTE: doing linear scans over associative arrays is not an option--the files are way too big. That is why I am trying to get hash tables and multiple keys working.

Thank You!

open (IN1s,"$ARGV[0].sum"); open (IN2s,"$ARGV[1].sum"); open (XLOUT,">pt.forxl"); open (ASTONLY,">ast.only"); open (PTONLY,">pt.only"); @in1s = <IN1s>; @in2s = <IN2s>; %AstContent = (); $ln = 0; while ($in1s[$ln] ne "") { chop ($in1s[$ln]); @in1sal = split(/\s+/,$in1s[$ln]); $astlength = @in1sal; $astlast = $astlength - 1; for ($i = 0; $i <= 3; $i++) { $AstStartEndWithClocks = join (" ",$AstStartEndWithClocks,$in1 +sal[$i]); } for ($i = 7; $i <= $astlast; $i++) { $AstMasterList = join (" ",$AstMasterList,$in1sal[$i]); } $AstContent{$AstStartEndWithClocks,$AstMasterList} = @in1sal; # # I know the above hash table is wrong, but I don't # know how to create a table with 2 keys. In short, # take the current list (@in1sal) and assign 2 keys # to it. # $AstStartEndWithClocks = (); # undef in case the # same pattern of 4 # words comes up again $AstMasterList = (); # undef this because # its length can change # from one line to the # next... undef (@in1sal); $ln++; } %PTContent = (); $ln = 0; while ($in2s[$ln] ne "") { chop ($in2s[$ln]); @in2sal = split(/\s+/,$in2s[$ln]); $ptlength = @in2sal; $ptlast = $ptlength - 1; for ($i = 0; $i <= 3; $i++) { $PTStartEndWithClocks = join (" ",$PTStartEndWithClocks,$in2sa +l[$i]); } for ($i = 7; $i <= $ptlast; $i++) { $PTMasterList = join (" ",$PTMasterList,$in2sal[$i]); } $PTContent{$PTStartEndWithClocks,$PTMasterList} = @in2sal; # # Same deal as above: I know this is wrong but I # don't know how to assign 2 keys to the current # list (@in2sal)... # $PTStartEndWithClocks = (); $PTMasterList = (); undef (@in2sal); $ln++; } # Parse each hash table (AstContent and PTContent)--when # AstStartEndWithClocks and PTStartEndWithClocks match, # print the result to file XLOUT. # Now, if there was an # AstStartEndWithClocks/PTStartEndWithClocks match, check # to see of $AstMasterList and $PTMasterList match. If they # do NOT, print the line to the screen. # If AstContent's AstStartEndWithClocks cannot be matched # in PTContent, write the line to the file ASTONLY. # If PTContent's PTStartEndWithClocks cannot be matched in # AstContent, write the line to the file PTONLY.

janitored by ybiC: Balanced <readmore> tags around longish codeblock, to avoid/reduce vertical scrolling


In reply to 2 Hash Tables, 4 Keys...what to do? by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.