Importing data to build an array

remluvr has asked for the wisdom of the Perl Monks concerning the following question:

Hi everyone! I have a question, I don't even know if it's possible doing what I'm thinking, but still, I thought asking wouldn't hurt..
I have a csv file that looks like this:

DISTINGUERE TRA;1;14;507;0,000000242475382686773;0,0000033946553576148
+2;0,000122935019022194;0,00000000041732202096217;9,18246152003019;9,1
+8246152003019
MANCANTE DI;1;56;507;0,000000242475382686773;0,0000135786214304593;0,0
+00122935019022194;0,00000000166928808384868;7,18246152003019;7,182461
+52003019
APPLICARE SU;1;64;507;0,000000242475382686773;0,0000155184244919535;0,
+000122935019022194;0,00000000190775781011278;6,9898164420878;6,989816
+4420878
MONTATO IN;1;78;507;0,000000242475382686773;0,0000189130798495683;0,00
+0122935019022194;0,00000000232507983107495;6,70441422322555;6,7044142
+2322555
IMPIEGATO IN;2;180;507;0,000000484950765373545;0,0000436455688836191;0
+,000122935019022194;0,00000000536556884094218;6,49796334575812;12,995
+9266915162
RAGGRUPPARE IN;1;109;507;0,000000242475382686773;0,0000264298167128582
+;0,000122935019022194;0,00000000324915002034832;6,22163211731087;6,22
+163211731087
[download]

and say another one that's like this:

DISTINGUERE TRA;1;14;507;0,000000242475382686773;0,0000033946553576148
+2;0,000122935019022194;0,00000000041732202096217;9,18246152003019;9,1
+8246152003019
APPLICARE SU;1;64;507;0,000000242475382686773;0,0000155184244919535;0,
+000122935019022194;0,00000000190775781011278;6,9898164420878;6,989816
+4420878
MONTATO IN;1;78;507;0,000000242475382686773;0,0000189130798495683;0,00
+0122935019022194;0,00000000232507983107495;6,70441422322555;6,7044142
+2322555
IMPIEGATO IN;2;180;507;0,000000484950765373545;0,0000436455688836191;0
+,000122935019022194;0,00000000536556884094218;6,49796334575812;12,995
+9266915162
[download]

where "MANCANTE DI" and "RAGGRUPPARE IN" are missed. I'd like to know if it's possibli to build two arrays from these data, that are made just of the last numerical value we have for every text value. And where the values that are missed are automatically set to 0. Let's say, something like this:

$1=[9,18246152003019; 7,18246152003019; 6,9898164420878; 6,70441422322
+555; 12,9959266915162; 6,22163211731087]</br>
$2=[9,18246152003019; 0; 6,9898164420878; 6,70441422322555; 12,9959266
+915162; 0]
[download]

Is it possible? How would you do it?
Thanks,

Comment on Importing data to build an array Select or Download Code

Replies are listed 'Best First'.
Re: Importing data to build an array by Perlbotics (Archbishop) on Feb 21, 2009 at 22:34 UTC
Hi, I don't know if you can keep all two (or more?) files in memory... So I fiddled around with an algorithm that roughly needs to keep the significant data of the first file in memory plus the result for the next file to be compared. Basically it works like this: The first file is read. The expected order of keys is learned plus the mappings from the first column (key) to the last column (value). For each file to be compared against the first file: read a CSV line and analyse it if the line matches the expected key (e.g. DISTINGUERE TRA), save the result and advance to the next line/key if not, add a zero to the result unless the current line matches the expected key; advance to the next line/key add more zeros to the result in case an EOF occurred before the last expected key was encountered perform a lot of sanity checks along the way Print the results. Output: Entries: DISTINGUERE TRA;MANCANTE DI;APPLICARE SU;MONTATO IN;IMPIEGATO IN;RAGGRUPPARE IN File 1: 9,18246152003019; 7,18246152003019; 6,9898164420878; 6,70441422322555; 12,9959266915162; 6,22163211731087 File 2: 9,18246152003019; 0; 6,9898164420878; 6,70441422322555; 12,9959266915162; 0 File 3: 9,18246152003019; 0; 0; 0; 0; 6,22163211731087 File 4: 0; 0; 0; 6,70441422322555; 0; 0 item found that is not in first file: MONTATO INorOUTorWhatever / 6,70441422322555 Maybe this is just too complicated (premature optimisation), but I hope it helps as a starter... Read more... (7 kB) Update: patched as requested	[reply] [d/l]
Re^2: Importing data to build an array by remluvr (Sexton) on Feb 22, 2009 at 10:31 UTC
It works, the only problem is this line: `if (exists $file1_data{$key}) { if ($file1_data{$key} ne $value) { # ensure val's didn't change` [download] I don't need to ensure vals don't change. I need to write different arrays which maintains the order of elements, even if they're different. I played around this line for a while, but I'm not able to change it. Any suggestions?	[reply] [d/l]
Re^2: Importing data to build an array by remluvr (Sexton) on Feb 22, 2009 at 08:47 UTC
Thanks to both of you. I'm basically new to perl. I'll try what you suggest.. Thanks!	[reply]
Re: Importing data to build an array by bichonfrise74 (Vicar) on Feb 21, 2009 at 19:10 UTC
If I understand your question, I would do the following: 1. Save the csv files in a hash format. 2. Compare the hash values (the text fields) to determine whether they are not found in each files. 3. If number 2 is true, then get the values of the text fields from the hashes.	[reply]
Re: Importing data to build an array by johngg (Canon) on Feb 22, 2009 at 17:05 UTC
It would be interesting to know how the files are ordered. Your post implies that the second file is in the same order as the first, but with gaps, and your response to Perlbotics says that you need to maintain the same order. This code seems to do that with the data you've provided. use strict; use warnings; use Data::Dumper; open my $firstFH, q{<}, \ <<END1 or die qq{open: < HEREDOC: $!\n}; DISTINGUERE TRA;1;14;507;0,000000242475382686773;0,0000033946553576148 +2;0,000122935019022194;0,00000000041732202096217;9,18246152003019;9,1 +8246152003019 MANCANTE DI;1;56;507;0,000000242475382686773;0,0000135786214304593;0,0 +00122935019022194;0,00000000166928808384868;7,18246152003019;7,182461 +52003019 APPLICARE SU;1;64;507;0,000000242475382686773;0,0000155184244919535;0, +000122935019022194;0,00000000190775781011278;6,9898164420878;6,989816 +4420878 MONTATO IN;1;78;507;0,000000242475382686773;0,0000189130798495683;0,00 +0122935019022194;0,00000000232507983107495;6,70441422322555;6,7044142 +2322555 IMPIEGATO IN;2;180;507;0,000000484950765373545;0,0000436455688836191;0 +,000122935019022194;0,00000000536556884094218;6,49796334575812;12,995 +9266915162 RAGGRUPPARE IN;1;109;507;0,000000242475382686773;0,0000264298167128582 +;0,000122935019022194;0,00000000324915002034832;6,22163211731087;6,22 +163211731087 END1 my @order = (); my @arr1 = (); while( <$firstFH> ) { chomp; my( $col1, $lastVal ) = ( split m{;} )[ 0, -1 ]; push @order, $col1; push @arr1, $lastVal; } close $firstFH or die qq{close: < HEREDOC: $!\n}; open my $secondFH, q{<}, \ <<END2 or die qq{open: < HEREDOC: $!\n}; DISTINGUERE TRA;1;14;507;0,000000242475382686773;0,0000033946553576148 +2;0,000122935019022194;0,00000000041732202096217;9,18246152003019;9,1 +8246152003019 APPLICARE SU;1;64;507;0,000000242475382686773;0,0000155184244919535;0, +000122935019022194;0,00000000190775781011278;6,9898164420878;6,989816 +4420878 MONTATO IN;1;78;507;0,000000242475382686773;0,0000189130798495683;0,00 +0122935019022194;0,00000000232507983107495;6,70441422322555;6,7044142 +2322555 IMPIEGATO IN;2;180;507;0,000000484950765373545;0,0000436455688836191;0 +,000122935019022194;0,00000000536556884094218;6,49796334575812;12,995 +9266915162 END2 my $idx = 0; my @arr2 = (); while( <$secondFH> ) { chomp; my( $col1, $lastVal ) = ( split m{;} )[ 0, -1 ]; if( $col1 eq $order[ $idx ] ) { push @arr2, $lastVal; } else { push( @arr2, q{0} ), $idx ++ until $col1 eq $order[ $idx ]; push @arr2, $lastVal; } $idx ++; } push @arr2, q{0} until scalar @arr1 == scalar @arr2; close $secondFH or die qq{close: < HEREDOC: $!\n}; print Data::Dumper->Dumpxs( [ \ @arr1, \ @arr2 ], [ qw{ arr1 arr2 } ] ); print q{[} . join( q{; }, @arr1 ) . qq{]\n[} . join( q{; }, @arr2 ) . qq{]\n}; [download] The resultant arrays output using Data::Dumper. `@arr1 = ( '9,18246152003019', '7,18246152003019', '6,9898164420878', '6,70441422322555', '12,9959266915162', '6,22163211731087' ); @arr2 = ( '9,18246152003019', '0', '6,9898164420878', '6,70441422322555', '12,9959266915162', '0' ); [9,18246152003019; 7,18246152003019; 6,9898164420878; 6,70441422322555 +; 12,9959266915162; 6,22163211731087] [9,18246152003019; 0; 6,9898164420878; 6,70441422322555; 12,9959266915 +162; 0]` [download] I hope this is helpful. Cheers, JohnGG	[reply] [d/l] [select]
Re^2: Importing data to build an array by Anonymous Monk on Feb 22, 2009 at 19:45 UTC
I'll try using your code. Unfortunately the data aren't in the same order. I did the example that way because it was quickier, sorry! I'll see how it works with my data and I'll let you know. Thanks for your help, I really appreciate it!	[reply]
Re^3: Importing data to build an array by johngg (Canon) on Feb 22, 2009 at 21:16 UTC
Quicker? Not in the long run as it turns out :-) Anyway, if the order is different my original code will not work. Try this instead. use strict; use warnings; use Data::Dumper; open my $firstFH, q{<}, \ <<END1 or die qq{open: < HEREDOC: $!\n}; DISTINGUERE TRA;1;14;507;0,000000242475382686773;0,0000033946553576148 +2;0,000122935019022194;0,00000000041732202096217;9,18246152003019;9,1 +8246152003019 MANCANTE DI;1;56;507;0,000000242475382686773;0,0000135786214304593;0,0 +00122935019022194;0,00000000166928808384868;7,18246152003019;7,182461 +52003019 APPLICARE SU;1;64;507;0,000000242475382686773;0,0000155184244919535;0, +000122935019022194;0,00000000190775781011278;6,9898164420878;6,989816 +4420878 MONTATO IN;1;78;507;0,000000242475382686773;0,0000189130798495683;0,00 +0122935019022194;0,00000000232507983107495;6,70441422322555;6,7044142 +2322555 IMPIEGATO IN;2;180;507;0,000000484950765373545;0,0000436455688836191;0 +,000122935019022194;0,00000000536556884094218;6,49796334575812;12,995 +9266915162 RAGGRUPPARE IN;1;109;507;0,000000242475382686773;0,0000264298167128582 +;0,000122935019022194;0,00000000324915002034832;6,22163211731087;6,22 +163211731087 END1 my $recCt = 0; my @arr1 = (); my %arr1Order = (); while( <$firstFH> ) { chomp; my( $col1, $lastVal ) = ( split m{;} )[ 0, -1 ]; $arr1Order{ $col1 } = $recCt ++; push @arr1, $lastVal; } close $firstFH or die qq{close: < HEREDOC: $!\n}; open my $secondFH, q{<}, \ <<END2 or die qq{open: < HEREDOC: $!\n}; MONTATO IN;1;78;507;0,000000242475382686773;0,0000189130798495683;0,00 +0122935019022194;0,00000000232507983107495;6,70441422322555;6,7044142 +2322555 APPLICARE SU;1;64;507;0,000000242475382686773;0,0000155184244919535;0, +000122935019022194;0,00000000190775781011278;6,9898164420878;6,989816 +4420878 IMPIEGATO IN;2;180;507;0,000000484950765373545;0,0000436455688836191;0 +,000122935019022194;0,00000000536556884094218;6,49796334575812;12,995 +9266915162 DISTINGUERE TRA;1;14;507;0,000000242475382686773;0,0000033946553576148 +2;0,000122935019022194;0,00000000041732202096217;9,18246152003019;9,1 +8246152003019 END2 my @arr2 = ( 0 ) x scalar @arr1; while( <$secondFH> ) { chomp; my( $col1, $lastVal ) = ( split m{;} )[ 0, -1 ]; $arr2[ $arr1Order{ $col1 } ] = $lastVal; } close $secondFH or die qq{close: < HEREDOC: $!\n}; print Data::Dumper->Dumpxs( [ \ @arr1, \ @arr2 ], [ qw{ arr1 arr2 } ] ); print q{[} . join( q{; }, @arr1 ) . qq{]\n[} . join( q{; }, @arr2 ) . qq{]\n}; [download] The output is the same. Read more... (970 Bytes) Cheers, JohnGG	[reply] [d/l] [select]
Re: Importing data to build an array by sanku (Beadle) on Feb 23, 2009 at 09:22 UTC
hi friend, Try out this one... `pen (FILE1,"textfile1.txt") or die $!; while(<FILE1>){push(@value1,(split(/;\|\s+/,$_))[-1]);} close(FILE1); open (FILE2,"textfile2.txt") or die $!; while(<FILE2>){push(@value2,(split(/;\|\s+/,$_))[-1]);} close(FILE2); my %in_list1; foreach my $element(@value2){$in_list1{$element}=();} foreach my $item(@value1){ if (exists $in_list1{$item}){push(@notval,$item);}else{push(@notval,"0 +");} } $ll=join (';',@value1); $m='$1='."[$ll]</br>"; $kk=join (';',@notval); $n='$2='."[$kk]"; print $m; print $n;` [download]	[reply] [d/l]