Hi, I don't know if you can keep all two (or more?) files in memory... So I fiddled around with an algorithm that roughly needs to keep the significant data of the first file in memory plus the result for the next file to be compared. Basically it works like this:

  1. The first file is read. The expected order of keys is learned plus the mappings from the first column (key) to the last column (value).
  2. For each file to be compared against the first file:
    1. read a CSV line and analyse it
    2. if the line matches the expected key (e.g. DISTINGUERE TRA), save the result and advance to the next line/key
    3. if not, add a zero to the result unless the current line matches the expected key; advance to the next line/key
    4. add more zeros to the result in case an EOF occurred before the last expected key was encountered
    5. perform a lot of sanity checks along the way
  3. Print the results.
Output:

Entries: DISTINGUERE TRA;MANCANTE DI;APPLICARE SU;MONTATO IN;IMPIEGATO IN;RAGGRUPPARE IN
File 1: 9,18246152003019; 7,18246152003019; 6,9898164420878; 6,70441422322555; 12,9959266915162; 6,22163211731087
File 2: 9,18246152003019; 0; 6,9898164420878; 6,70441422322555; 12,9959266915162; 0
File 3: 9,18246152003019; 0; 0; 0; 0; 6,22163211731087
File 4: 0; 0; 0; 6,70441422322555; 0; 0
item found that is not in first file: MONTATO INorOUTorWhatever / 6,70441422322555

Maybe this is just too complicated (premature optimisation), but I hope it helps as a starter...

# UPDATE!!!! - After downloading, I noticed that this script didn't # work any longer. Problem was, that additional CR's # were added to the __DATA__ section. # Adding s/\x0d//g; to extract_csv_entries() fixed # this problem under Linux. # TODO: # [ ] use files instead of DATA # [ ] use a CPAN module to parse CSV files # [ ] simplify / it is much easier if all files can be hold in memory: # - read second file to $file2_data{$key}=$value # - then do something along: # @result = map { exists $file2_data{$_} ? $file2_data{$_} # : "0" # } @ordered_items; use strict; my %file1_data; # $file1_data{key} = value of last column my @ordered_items; # preserves order of keys from first file my $expected_no_of_entries = 10; # expected entries in a CSV line sub extract_csv_entries { # in a real world program, one would use a CSV module from CPAN... my $line = shift; # NOTE: maybe you need to remove the next line under Windows/Mac? $line =~ s/\x0d//g; my @csv_items = split /;/, $line; if (@csv_items != $expected_no_of_entries) { die "illegal number of entries: $line ... \n"; } return @csv_items[0,-1]; # first & last entry } sub compare_file { # here, we simulate to read from files... my $expect = 0; my @result; while (<DATA>) { next if /^\s*$/; last if /^EOF/; chomp; my ($key,$value) = extract_csv_entries($_); if (exists $file1_data{$key}) { # Update: uncomment these lines to ensure that # entries are the same as in 1st file... # if ($file1_data{$key} ne $value) { # ensure val's didn't chang +e # die "enties for $key differ: $file1_data{$key} <=> $value"; # } } else { die "item found that is not in first file: $key / $value\n"; } # advance expected key until match while ($key ne $ordered_items[$expect++]) { push(@result,0); } push(@result, $value); # finally in sync. } # EOF before last expected key? Pad with zeros... push(@result,0) for ($expect..$#ordered_items); if (@result != @ordered_items) { # paranoia die("internal error: " . join(";", @result)); } return \@result; } sub print_result { my ($file_no, $aref) = @_; print "File $file_no: ", join("; ", @{$aref}), "\n"; } # Step 1 - learn the key/value pairs and key-order # read the first "file" (emulated here) while (<DATA>) { next if /^\s*$/; # skip empty lines last if /^EOF/; # emulate eof chomp; my ($key,$value) = extract_csv_entries($_); push @ordered_items, $key; # learn order from first file $file1_data{$key} = $value; # finally learn key/value } # print the list of items for 1st file in original order print "Entries: ", join (";", @ordered_items), "\n"; print_result(1, [ (map { $file1_data{$_} } @ordered_items) ]); # now compare some sample files... for my $file_no (2..5) { print_result($file_no, compare_file() ); } __DATA__ DISTINGUERE TRA;1;14;507;0,000000242475382686773;0,0000033946553576148 +2;0,000122935019022194;0,00000000041732202096217;9,18246152003019;9,1 +8246152003019 MANCANTE DI;1;56;507;0,000000242475382686773;0,0000135786214304593;0,0 +00122935019022194;0,00000000166928808384868;7,18246152003019;7,182461 +52003019 APPLICARE SU;1;64;507;0,000000242475382686773;0,0000155184244919535;0, +000122935019022194;0,00000000190775781011278;6,9898164420878;6,989816 +4420878 MONTATO IN;1;78;507;0,000000242475382686773;0,0000189130798495683;0,00 +0122935019022194;0,00000000232507983107495;6,70441422322555;6,7044142 +2322555 IMPIEGATO IN;2;180;507;0,000000484950765373545;0,0000436455688836191;0 +,000122935019022194;0,00000000536556884094218;6,49796334575812;12,995 +9266915162 RAGGRUPPARE IN;1;109;507;0,000000242475382686773;0,0000264298167128582 +;0,000122935019022194;0,00000000324915002034832;6,22163211731087;6,22 +163211731087 EOF of first file DISTINGUERE TRA;1;14;507;0,000000242475382686773;0,0000033946553576148 +2;0,000122935019022194;0,00000000041732202096217;9,18246152003019;9,1 +8246152003019 APPLICARE SU;1;64;507;0,000000242475382686773;0,0000155184244919535;0, +000122935019022194;0,00000000190775781011278;6,9898164420878;6,989816 +4420878 MONTATO IN;1;78;507;0,000000242475382686773;0,0000189130798495683;0,00 +0122935019022194;0,00000000232507983107495;6,70441422322555;6,7044142 +2322555 IMPIEGATO IN;2;180;507;0,000000484950765373545;0,0000436455688836191;0 +,000122935019022194;0,00000000536556884094218;6,49796334575812;12,995 +9266915162 EOF of second file DISTINGUERE TRA;1;14;507;0,000000242475382686773;0,0000033946553576148 +2;0,000122935019022194;0,00000000041732202096217;9,18246152003019;me +differs! RAGGRUPPARE IN;1;109;507;0,000000242475382686773;0,0000264298167128582 +;0,000122935019022194;0,00000000324915002034832;6,22163211731087;6,22 +163211731087 EOF of dummy third file MONTATO IN;1;78;507;0,000000242475382686773;0,0000189130798495683;0,00 +0122935019022194;0,00000000232507983107495;6,70441422322555;6,7044142 +2322555 EOF of dummy fourth file MONTATO INorOUTorWhatever;1;78;507;0,000000242475382686773;0,000018913 +0798495683;0,000122935019022194;0,00000000232507983107495;6,704414223 +22555;6,70441422322555 EOF of illegal fifth file with illegal entry

Update: patched as requested


In reply to Re: Importing data to build an array by Perlbotics
in thread Importing data to build an array by remluvr

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.