remluvr has asked for the wisdom of the Perl Monks concerning the following question:

Hi everyone! I have a question, I don't even know if it's possible doing what I'm thinking, but still, I thought asking wouldn't hurt..
I have a csv file that looks like this:
DISTINGUERE TRA;1;14;507;0,000000242475382686773;0,0000033946553576148 +2;0,000122935019022194;0,00000000041732202096217;9,18246152003019;9,1 +8246152003019 MANCANTE DI;1;56;507;0,000000242475382686773;0,0000135786214304593;0,0 +00122935019022194;0,00000000166928808384868;7,18246152003019;7,182461 +52003019 APPLICARE SU;1;64;507;0,000000242475382686773;0,0000155184244919535;0, +000122935019022194;0,00000000190775781011278;6,9898164420878;6,989816 +4420878 MONTATO IN;1;78;507;0,000000242475382686773;0,0000189130798495683;0,00 +0122935019022194;0,00000000232507983107495;6,70441422322555;6,7044142 +2322555 IMPIEGATO IN;2;180;507;0,000000484950765373545;0,0000436455688836191;0 +,000122935019022194;0,00000000536556884094218;6,49796334575812;12,995 +9266915162 RAGGRUPPARE IN;1;109;507;0,000000242475382686773;0,0000264298167128582 +;0,000122935019022194;0,00000000324915002034832;6,22163211731087;6,22 +163211731087
and say another one that's like this:
DISTINGUERE TRA;1;14;507;0,000000242475382686773;0,0000033946553576148 +2;0,000122935019022194;0,00000000041732202096217;9,18246152003019;9,1 +8246152003019 APPLICARE SU;1;64;507;0,000000242475382686773;0,0000155184244919535;0, +000122935019022194;0,00000000190775781011278;6,9898164420878;6,989816 +4420878 MONTATO IN;1;78;507;0,000000242475382686773;0,0000189130798495683;0,00 +0122935019022194;0,00000000232507983107495;6,70441422322555;6,7044142 +2322555 IMPIEGATO IN;2;180;507;0,000000484950765373545;0,0000436455688836191;0 +,000122935019022194;0,00000000536556884094218;6,49796334575812;12,995 +9266915162
where "MANCANTE DI" and "RAGGRUPPARE IN" are missed. I'd like to know if it's possibli to build two arrays from these data, that are made just of the last numerical value we have for every text value. And where the values that are missed are automatically set to 0. Let's say, something like this:
$1=[9,18246152003019; 7,18246152003019; 6,9898164420878; 6,70441422322 +555; 12,9959266915162; 6,22163211731087]</br> $2=[9,18246152003019; 0; 6,9898164420878; 6,70441422322555; 12,9959266 +915162; 0]
Is it possible? How would you do it?
Thanks,

Replies are listed 'Best First'.
Re: Importing data to build an array
by Perlbotics (Archbishop) on Feb 21, 2009 at 22:34 UTC

    Hi, I don't know if you can keep all two (or more?) files in memory... So I fiddled around with an algorithm that roughly needs to keep the significant data of the first file in memory plus the result for the next file to be compared. Basically it works like this:

    1. The first file is read. The expected order of keys is learned plus the mappings from the first column (key) to the last column (value).
    2. For each file to be compared against the first file:
      1. read a CSV line and analyse it
      2. if the line matches the expected key (e.g. DISTINGUERE TRA), save the result and advance to the next line/key
      3. if not, add a zero to the result unless the current line matches the expected key; advance to the next line/key
      4. add more zeros to the result in case an EOF occurred before the last expected key was encountered
      5. perform a lot of sanity checks along the way
    3. Print the results.
    Output:

    Entries: DISTINGUERE TRA;MANCANTE DI;APPLICARE SU;MONTATO IN;IMPIEGATO IN;RAGGRUPPARE IN
    File 1: 9,18246152003019; 7,18246152003019; 6,9898164420878; 6,70441422322555; 12,9959266915162; 6,22163211731087
    File 2: 9,18246152003019; 0; 6,9898164420878; 6,70441422322555; 12,9959266915162; 0
    File 3: 9,18246152003019; 0; 0; 0; 0; 6,22163211731087
    File 4: 0; 0; 0; 6,70441422322555; 0; 0
    item found that is not in first file: MONTATO INorOUTorWhatever / 6,70441422322555

    Maybe this is just too complicated (premature optimisation), but I hope it helps as a starter...

    Update: patched as requested

      Thanks to both of you. I'm basically new to perl. I'll try what you suggest.. Thanks!
      It works, the only problem is this line:
      if (exists $file1_data{$key}) { if ($file1_data{$key} ne $value) { # ensure val's didn't change
      I don't need to ensure vals don't change. I need to write different arrays which maintains the order of elements, even if they're different. I played around this line for a while, but I'm not able to change it. Any suggestions?
Re: Importing data to build an array
by bichonfrise74 (Vicar) on Feb 21, 2009 at 19:10 UTC
    If I understand your question, I would do the following:

    1. Save the csv files in a hash format.
    2. Compare the hash values (the text fields) to determine whether they are not found in each files.
    3. If number 2 is true, then get the values of the text fields from the hashes.
Re: Importing data to build an array
by johngg (Canon) on Feb 22, 2009 at 17:05 UTC

    It would be interesting to know how the files are ordered. Your post implies that the second file is in the same order as the first, but with gaps, and your response to Perlbotics says that you need to maintain the same order. This code seems to do that with the data you've provided.

    use strict; use warnings; use Data::Dumper; open my $firstFH, q{<}, \ <<END1 or die qq{open: < HEREDOC: $!\n}; DISTINGUERE TRA;1;14;507;0,000000242475382686773;0,0000033946553576148 +2;0,000122935019022194;0,00000000041732202096217;9,18246152003019;9,1 +8246152003019 MANCANTE DI;1;56;507;0,000000242475382686773;0,0000135786214304593;0,0 +00122935019022194;0,00000000166928808384868;7,18246152003019;7,182461 +52003019 APPLICARE SU;1;64;507;0,000000242475382686773;0,0000155184244919535;0, +000122935019022194;0,00000000190775781011278;6,9898164420878;6,989816 +4420878 MONTATO IN;1;78;507;0,000000242475382686773;0,0000189130798495683;0,00 +0122935019022194;0,00000000232507983107495;6,70441422322555;6,7044142 +2322555 IMPIEGATO IN;2;180;507;0,000000484950765373545;0,0000436455688836191;0 +,000122935019022194;0,00000000536556884094218;6,49796334575812;12,995 +9266915162 RAGGRUPPARE IN;1;109;507;0,000000242475382686773;0,0000264298167128582 +;0,000122935019022194;0,00000000324915002034832;6,22163211731087;6,22 +163211731087 END1 my @order = (); my @arr1 = (); while( <$firstFH> ) { chomp; my( $col1, $lastVal ) = ( split m{;} )[ 0, -1 ]; push @order, $col1; push @arr1, $lastVal; } close $firstFH or die qq{close: < HEREDOC: $!\n}; open my $secondFH, q{<}, \ <<END2 or die qq{open: < HEREDOC: $!\n}; DISTINGUERE TRA;1;14;507;0,000000242475382686773;0,0000033946553576148 +2;0,000122935019022194;0,00000000041732202096217;9,18246152003019;9,1 +8246152003019 APPLICARE SU;1;64;507;0,000000242475382686773;0,0000155184244919535;0, +000122935019022194;0,00000000190775781011278;6,9898164420878;6,989816 +4420878 MONTATO IN;1;78;507;0,000000242475382686773;0,0000189130798495683;0,00 +0122935019022194;0,00000000232507983107495;6,70441422322555;6,7044142 +2322555 IMPIEGATO IN;2;180;507;0,000000484950765373545;0,0000436455688836191;0 +,000122935019022194;0,00000000536556884094218;6,49796334575812;12,995 +9266915162 END2 my $idx = 0; my @arr2 = (); while( <$secondFH> ) { chomp; my( $col1, $lastVal ) = ( split m{;} )[ 0, -1 ]; if( $col1 eq $order[ $idx ] ) { push @arr2, $lastVal; } else { push( @arr2, q{0} ), $idx ++ until $col1 eq $order[ $idx ]; push @arr2, $lastVal; } $idx ++; } push @arr2, q{0} until scalar @arr1 == scalar @arr2; close $secondFH or die qq{close: < HEREDOC: $!\n}; print Data::Dumper->Dumpxs( [ \ @arr1, \ @arr2 ], [ qw{ *arr1 *arr2 } ] ); print q{[} . join( q{; }, @arr1 ) . qq{]\n[} . join( q{; }, @arr2 ) . qq{]\n};

    The resultant arrays output using Data::Dumper.

    @arr1 = ( '9,18246152003019', '7,18246152003019', '6,9898164420878', '6,70441422322555', '12,9959266915162', '6,22163211731087' ); @arr2 = ( '9,18246152003019', '0', '6,9898164420878', '6,70441422322555', '12,9959266915162', '0' ); [9,18246152003019; 7,18246152003019; 6,9898164420878; 6,70441422322555 +; 12,9959266915162; 6,22163211731087] [9,18246152003019; 0; 6,9898164420878; 6,70441422322555; 12,9959266915 +162; 0]

    I hope this is helpful.

    Cheers,

    JohnGG

      I'll try using your code. Unfortunately the data aren't in the same order. I did the example that way because it was quickier, sorry! I'll see how it works with my data and I'll let you know. Thanks for your help, I really appreciate it!

        Quicker? Not in the long run as it turns out :-)

        Anyway, if the order is different my original code will not work. Try this instead.

        use strict; use warnings; use Data::Dumper; open my $firstFH, q{<}, \ <<END1 or die qq{open: < HEREDOC: $!\n}; DISTINGUERE TRA;1;14;507;0,000000242475382686773;0,0000033946553576148 +2;0,000122935019022194;0,00000000041732202096217;9,18246152003019;9,1 +8246152003019 MANCANTE DI;1;56;507;0,000000242475382686773;0,0000135786214304593;0,0 +00122935019022194;0,00000000166928808384868;7,18246152003019;7,182461 +52003019 APPLICARE SU;1;64;507;0,000000242475382686773;0,0000155184244919535;0, +000122935019022194;0,00000000190775781011278;6,9898164420878;6,989816 +4420878 MONTATO IN;1;78;507;0,000000242475382686773;0,0000189130798495683;0,00 +0122935019022194;0,00000000232507983107495;6,70441422322555;6,7044142 +2322555 IMPIEGATO IN;2;180;507;0,000000484950765373545;0,0000436455688836191;0 +,000122935019022194;0,00000000536556884094218;6,49796334575812;12,995 +9266915162 RAGGRUPPARE IN;1;109;507;0,000000242475382686773;0,0000264298167128582 +;0,000122935019022194;0,00000000324915002034832;6,22163211731087;6,22 +163211731087 END1 my $recCt = 0; my @arr1 = (); my %arr1Order = (); while( <$firstFH> ) { chomp; my( $col1, $lastVal ) = ( split m{;} )[ 0, -1 ]; $arr1Order{ $col1 } = $recCt ++; push @arr1, $lastVal; } close $firstFH or die qq{close: < HEREDOC: $!\n}; open my $secondFH, q{<}, \ <<END2 or die qq{open: < HEREDOC: $!\n}; MONTATO IN;1;78;507;0,000000242475382686773;0,0000189130798495683;0,00 +0122935019022194;0,00000000232507983107495;6,70441422322555;6,7044142 +2322555 APPLICARE SU;1;64;507;0,000000242475382686773;0,0000155184244919535;0, +000122935019022194;0,00000000190775781011278;6,9898164420878;6,989816 +4420878 IMPIEGATO IN;2;180;507;0,000000484950765373545;0,0000436455688836191;0 +,000122935019022194;0,00000000536556884094218;6,49796334575812;12,995 +9266915162 DISTINGUERE TRA;1;14;507;0,000000242475382686773;0,0000033946553576148 +2;0,000122935019022194;0,00000000041732202096217;9,18246152003019;9,1 +8246152003019 END2 my @arr2 = ( 0 ) x scalar @arr1; while( <$secondFH> ) { chomp; my( $col1, $lastVal ) = ( split m{;} )[ 0, -1 ]; $arr2[ $arr1Order{ $col1 } ] = $lastVal; } close $secondFH or die qq{close: < HEREDOC: $!\n}; print Data::Dumper->Dumpxs( [ \ @arr1, \ @arr2 ], [ qw{ *arr1 *arr2 } ] ); print q{[} . join( q{; }, @arr1 ) . qq{]\n[} . join( q{; }, @arr2 ) . qq{]\n};

        The output is the same.

        Cheers,

        JohnGG

Re: Importing data to build an array
by sanku (Beadle) on Feb 23, 2009 at 09:22 UTC
    hi friend, Try out this one...
    pen (FILE1,"textfile1.txt") or die $!; while(<FILE1>){push(@value1,(split(/;|\s+/,$_))[-1]);} close(FILE1); open (FILE2,"textfile2.txt") or die $!; while(<FILE2>){push(@value2,(split(/;|\s+/,$_))[-1]);} close(FILE2); my %in_list1; foreach my $element(@value2){$in_list1{$element}=();} foreach my $item(@value1){ if (exists $in_list1{$item}){push(@notval,$item);}else{push(@notval,"0 +");} } $ll=join (';',@value1); $m='$1='."[$ll]</br>"; $kk=join (';',@notval); $n='$2='."[$kk]"; print $m; print $n;