stevemayes has asked for the wisdom of the Perl Monks concerning the following question:
Hi,
I have two files containing user data: I'm trying to do a line by line partial merge: that is to say I take file one, line one which concerns user A N Other and search through file two until I find the line relating to him. Then I want take a single entry from that line (it happens to be the first entry which makes the split relatively simple) and add that entry to the 1st file.
In my own simplistic fashion I simply used a nested loop, splitting the lines in both files to appropriate variable names then printed these out to a third file that should contain all the rows in file one with one extra item.
The third file contains only the first five lines and not the other 550 lines. Despite using -w warnings, strict, and diagnostics, there are no errors which leads me to conclude that the script is doing what I'm telling it to I'm just telling it to do something other than what I intended.
open(OUT1, ">> resultsfile.csv")||die("Failed to open resultsfile.csv: + $!\n"); open(IR, "< file1.csv")|| die("failed to open file1"); open(BS, "< file2.csv")||die ("failed to open file2"); while (defined ($line=<IR>)) { chomp $line; ($username, $firstname, $lastname, $lastlogin, $tokenexpiration) = + split(",", $line); while (defined ($line2=<BS>)) { if ($line2 =~ /$lastname/ && $line2 =~ /$firstname/) { ($bs, undef) = split(",", $line2, 2); print OUT1 "$bs $username,$firstname,$lastname,$lastlogin, +$tokenexpiration\n"; last; # go to next name in outer loop. } } }
__Data__ From file 1 Alan,Bloggs,06/11/2009,11/30/2011 David,Smith,06/08/2009,09/30/2012 Rosario,Anotherone,06/05/2009,11/30/2011 Angela,Madeupname,06/11/2009,07/31/2010 Ugochukwu,Smith,06/01/2009,10/31/2012 Amarjit,Patel,08/19/2008,11/30/2011 Julie,Schmidt,05/01/2009,09/30/2012 Waseem,Alder,06/11/2009,11/30/2011 ... for another 580 lines. File 2 abc,Alan,Bloggs,06/11/2009,11/30/2011,morefields,... cde,David,Smith,06/08/2009,09/30/2012,morefields,... abc,Rosario,Anotherone,06/05/2009,11/30/2011,morefields,... acd,Angela,Madeupname,06/11/2009,07/31/2010,morefields,... cde,Ugochukwu,Smith,06/01/2009,10/31/2012,morefields,... tla,Julie,Schmidt,05/01/2009,09/30/2012,morefields,... tl,Waseem,Alder,06/11/2009,11/30/2011,morefields,... __End of Data__
Note that the script is taking the correct data from each file, but it is only iterating through the first five lines where the desired result is to iterate through all the lines in file1 (correction: it only appears to iterate through the first five lines).
If I remove the line last; # go to next name in outer loop then the output file only has one name in it (the first line).
If I start the outer loop like this LABEL: while (defined ($line=<IR>)) { and replace the 'last;' line with next LABEL;
then the final file contains the first five entries (as with the unlabelled code).Help in understanding what is going on and a fix is obviously what I'm looking for but comments on the approach are welcome too. Could I have used an array or a hash (yes but should I have and how)? Are there more efficient or clearer ways of achieving the desired result?
oh and while I'm asking...the regex match if ($line2 =~ /$lastname/ && $line2 =~ /$firstname/) is clunky, how can it be rewritten? I tried if ($line2 =~ /$lastname/ && /$firstname/) but got unitialised variable errors and if ($line2 =~ /$lastname&&$firstname/) just returns an empty file with no errors registered.
UpdateAs noted by SuicideJunkie, Transient and Si_lence, the problem was that the pointer was sitting at the bottom of the second file and not as I had imagined, re-searching that file on each succeeding iteration. Placing the line:
seek(BS,0,0);immediately before the inner loop fixed the issue. (Of course that allows me to move on to the next twelve gazillion issues but that was some very useful information - thank you, I'll vote your responses up with my next available votes).
Update 2So my final, working solution (for now) which pulls out the data I want, spits it into a file after comparing each line in file one with file two.
Well I swallowed my noobish fears and decided to follow Suicide Junkies et al.'s advice and simply slurped the files into memory (similar to Bichonfrise74's code I think). File one made sense as a hash: {name => value} key pair. File two was an array of arrays including name. So by iterating through file two and pulling out the name each time I was able to pull the value and spit them all out into a results file.
I'm already aware of various changes and improvements I want to make in v.2 but for now this code does what is needed. Thank you all.
Yet another update
I replaced all the individual declarations with two simple ones: my "($..., %..., etc);" and "our ...;"
!/usr/bin/perl -w use Time::Local; use strict; use warnings; use diagnostics; use Data::Dumper; # declare stuff to avoid errors my (%user_bs $user_bs @temp, @temp2, @user_expiration_inf, $firstname, + $current, $username, $lastname, $lastlogin, $tokenexpiration, $month +, $day, $year, $edate, $line, $line2, $bs, $b1, $y, $z, $name, $name2 +); our $/; $current = time(); # slurp up the first file into one huge hash ( name => value ) open(BSFILE, "< file1.csv") || die("Can't open file1.csv: $!\n"); while (defined ($line=<BSFILE>)) { next if $line =~ /^Data here/; chomp $line; ($b1, $y, $z) = split(/,/, $line); $name = "$z $y"; $user_bs{ $name } = $b1; } close(BSFILE); # Now to whack file two into an array of arrays. open(USERLIST, $ARGV[0]) || die("Can't open $ARGV[0]: $!\n"); while (<>) { next if /^Default Login/; # ignore first line ($username, $lastname, $firstname, $lastlogin, $tokenexpiration) = + split(",", $_))[0, 1, 2, 5, 8]; ($month, $day, $year) = split('/', $tokenexpiration); $edate = timelocal(00,00,00,$day,$month-1,$year-1900); next if $edate < $current; # remove from list all users whose toke +ns expired before today $name2 = "$firstname $lastname"; @temp2 = ($username, $name2, $lastlogin, $tokenexpiration); push @user_expiration_inf, [@temp2]; } close(USERLIST); open(OUT, ">>results.csv") || die("Can't open results.csv: $!\n"); print OUT "User, Data Here, Username, Last login, Token expiration dat +e\n"; # nice while loop to iterate through the array of arrays my $i = 0; while ($user_expiration_inf[$i][1]) { print OUT "$user_expiration_inf[$i][1], $user_bs{$user_expirati +on_inf[$i][1]}, $user_expiration_inf[$i][0], $user_expiration_inf[$i] +[2], $user_expiration_inf[$i][3]\n"; $i++; }
I'll look at Txt::CSV and Txt::CSV_XS soon. as well as DBI::RAM. Continued suggestions for improvement are totally welcome. This code is posted in the hope that it will help other total noobs.
|
---|