I have two files containing thousands, to potentially millions of lines of comma separated records. Each line is a stock market symbol, followed by its various "tick" data for the moment in time indicated by the time-stamp in column #4. The lines in FILE2, should match the lines in FILE1, but there will be instances where they won't and I would like to put together a script that will determine the following:

1.) Using the "SEQUENCE NUMBER" (the typically 7-digit number found in column #2) as the key, which lines in FILE1, are not found in FILE2.

2.) If the SEQ# is found in FILE2, continue on to compare each remaining element of the shared record (TYPE in FILE1 to TYPE in FILE2, BID to BID, SIZE to SIZE, etc.)

From what I gather, I will need to create at least one HASH to perform this action. Using code examples found on the web, I know how to manually create a very basic HASH. What I don't know how to do is:

- Import a file into a HASH, using one element as the key and the remaining elements as individual values assigned to that key. At best, I think i've been able to only import each element of each line as its own key.

- The verbiage and format needed to articulate comparisons between elements. This is what confuses me the most.

LEGEND (only the first 10 elements of each line concern me): SYMBOL,SE +QEUENCE#,TYPE (Quote or Trade or Custom),TIMESTAMP,TYPE,STATUS,BID,BI +D-SIZE,ASK,ASK-SIZE

FILE1:

ESM3,2285319,Q,13:58:50.744000,Q,WIDE,1549.250000,656,1549.500000,522, +0.000000,0.000000,0.000000,105,67,N,CME,CME ESM3,2285247,T,13:58:49.986000,SELL,1549.250000,2,0,1738560,,U ESM3,2285320,Q,13:58:50.749000,Q,WIDE,1549.250000,656,1549.500000,524, +0.000000,0.000000,0.000000,105,68,N,CME,CME ESM3,2285321,Q,13:58:50.750000,Q,WIDE,1549.250000,655,1549.500000,524, +0.000000,0.000000,0.000000,104,68,N,CME,CME ESM3,2285325,Q,13:58:50.801000,Q,WIDE,1549.250000,655,1549.500000,522, +0.000000,0.000000,0.000000,104,67,N,CME,CME ESM3,2285326,Q,13:58:50.802000,Q,WIDE,1549.250000,656,1549.500000,522, +0.000000,0.000000,0.000000,105,67,N,CME,CME ESM3,2285328,Q,13:58:50.831000,Q,WIDE,1549.250000,667,1549.500000,522, +0.000000,0.000000,0.000000,106,67,N,CME,CME ESM3,2285329,Q,13:58:50.832000,Q,WIDE,1549.250000,1504,1549.500000,522 +,0.000000,0.000000,0.000000,107,67,N,CME,CME ESM3,2285330,Q,13:58:50.833000,Q,WIDE,1549.250000,1505,1549.500000,522 +,0.000000,0.000000,0.000000,108,67,N,CME,CME ESM3,2285331,Q,13:58:50.833000,Q,WIDE,1549.250000,1506,1549.500000,522 +,0.000000,0.000000,0.000000,109,67,N,CME,CME ESM3,2285332,Q,13:58:50.833000,Q,WIDE,1549.250000,1506,1549.500000,520 +,0.000000,0.000000,0.000000,109,66,N,CME,CME ESM3,2285333,Q,13:58:50.833000,Q,WIDE,1549.250000,1506,1549.500000,519 +,0.000000,0.000000,0.000000,109,65,N,CME,CME ESM3,2285334,Q,13:58:50.833000,Q,WIDE,1549.250000,1507,1549.500000,519 +,0.000000,0.000000,0.000000,110,65,N,CME,CME

FILE2:

ESM3,2341309,Q,14:13:42.044000,Q,WIDE,1550.000000,555,1550.250000,834, +0.000000,0.000000,0.000000,140,76,N,CME,CME ESM3,2341311,Q,14:13:42.445000,Q,WIDE,1550.000000,554,1550.250000,834, +0.000000,0.000000,0.000000,139,76,N,CME,CME ESM3,2341312,Q,14:13:42.445000,Q,WIDE,1550.000000,554,1550.250000,833, +0.000000,0.000000,0.000000,139,75,N,CME,CME ESM3,2341313,Q,14:13:42.544000,Q,WIDE,1550.000000,550,1550.250000,833, +0.000000,0.000000,0.000000,138,75,N,CME,CME ESM3,2341314,Q,14:13:42.544000,Q,WIDE,1550.000000,551,1550.250000,833, +0.000000,0.000000,0.000000,139,75,N,CME,CME ESM3,2341315,Q,14:13:42.544000,Q,WIDE,1550.000000,551,1550.250000,834, +0.000000,0.000000,0.000000,139,76,N,CME,CME ESM3,2341316,Q,14:13:42.666000,Q,WIDE,1550.000000,552,1550.250000,834, +0.000000,0.000000,0.000000,140,76,N,CME,CME ESM3,2341317,Q,14:13:42.809000,Q,WIDE,1550.000000,552,1550.250000,837, +0.000000,0.000000,0.000000,140,77,N,CME,CME ESM3,2341319,T,14:13:42.851000,SELL,1550.000000,5,0,1786787,,U ESM3,2341319,Q,14:13:42.851000,Q,WIDE,1550.000000,547,1550.250000,837, +0.000000,0.000000,0.000000,140,77,N,CME,CME ESM3,2341320,Q,14:13:42.864000,Q,WIDE,1550.000000,542,1550.250000,837, +0.000000,0.00000

I'm not exactly new to PERL, though I've only used it for very basic data manipulation or searches (where shell scripting would probably have been completely adequate, but have almost no experience shell scripting). Comparing multiple values in two different files has been absolutely puzzling to me.

Below is the closest I could get to importing anything into the HASH; using just one file for an example. Problem is, I've no idea what's being used for the key and i've no idea how to initiate a comparison between this and a second file.

#!/usr/bin/perl #use warnings; #use strict; my $inFile = "CME.ESM3.MKD11.out"; open(FH1, '<', $inFile) or die("Can't open input file \"$inFile\": $!\n"); my %hash; while ($line=<FH1>) { chomp; split /,/, $line; $hash{symbol} = $_[0]; $hash{seqNum} = $_[1]; $hash{type} = $_[2]; $hash{timestamp} = $_[3]; $hash{status} = $_[5]; $hash{bid} = $_[6]; $hash{bidVol} = $_[7]; $hash{ask} = $_[8]; $hash{askVol} = $_[9]; for $key (keys %hash) { print "$key\=$hash{$key}\t"; } print "\n"; }
Any help would be greatly appreciated. Thanks

In reply to Help creating HASH for file comparison by jb60606

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.