Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello all,
I have a tab delimited file that contains (orig_accts.txt) 7 columns. I have another file (accts_to_exclude) which I want to compare against orig_accts¡Kand only output to a file what exists in orig_accts (but NOT in accts_to_exclude). I am currently doing this using a hash with following code.
$excludeaccts = "accts_to_exclude.txt"; Open(FILEHANDLE, $excludeaccts) || die ("Cannot open $excludeaccts"); %lookup = map {chomp; $_ => undef} <FILEHANDLE>; close(FILEHANDLE); $origlist = "orig_accts.txt"; open(ACCTLISTFILE, $origlist) || die ("Cannot open orig_accts.txt"); open($output, '>', 'output.txt') || die ("Cannot open output file outp +ut.txt"); while ($line = <ACCTLISTFILE>){ chomp $line; ($filename, $state, $amt, $ttl, $account, $name, $invnum) = split /\ +t/, $line; next if exists $lookup{$account}; print $output "$filename $state $amt $ttl $account $name $invnum\n"; }

Now, I want to check this output file against more lists
List1.txt (contains a list of accounts)
List2.txt (contains a diff list of accounts)
List3.txt (contains a list of names)
Check each account (from output.txt) against account in List1.txt and if it exists, do something (write this info to file1) and go on to checking the next account, and so on.
If the account does not exist in List1.txt, go on to checking for it in List2.txt. If it does exist in List2.txt, write this information into file2¡K.and continue checking the list
If the accounts does not exist in List2.txt, check for the name against the name in List3.txt, and if it exists, write it to file2.
Everything that remains should be written to file1
I'm not sure how to add this logic to the above code. I think it's possible to have more hashes but it doesn¡¦t seem very efficient. Any help is greatly appreciated.
Thank you.

Replies are listed 'Best First'.
Re: comparing lists
by GrandFather (Saint) on Feb 02, 2006 at 04:03 UTC

    Compared with file I/O manipulating hashes is cheap. Something like this perhaps:

    use warnings; use strict; open inFile, '<', "accts_to_exclude.txt" || die "Cannot open accts_to_ +exclude.txt"; my %excludeList = map {chomp; $_ => undef} <inFile>; close inFile; open inFile, '<', "list1.txt" || die "Cannot open list1.txt"; my %acctList = map {chomp; $_ => undef} <inFile>; close inFile; open inFile, '<', "list2.txt" || die "Cannot open list2.txt"; my %diffList = map {chomp; $_ => undef} <inFile>; close inFile; open inFile, '<', "list3.txt" || die "Cannot open list3.txt"; my %nameList = map {chomp; $_ => undef} <inFile>; close inFile; open orgAcctList, '<', "orig_accts.txt" || die "Cannot open orig_accts +.txt"; open outFile, '>', 'output.txt' || die "Cannot open output file output +.txt"; while (my $line = <orgAcctList>){ chomp $line; my ($filename, $state, $amt, $ttl, $account, $name, $invnum) = spl +it /\t/, $line; next if exists $excludeList{$account}; print outFile "$filename $state $amt $ttl $account $name $invnum\n +"; if (defined $acctList{$account}) { # do something (write this info to file1) next; } if (defined $diffList{$account}) { # do something (write this info to file2¡K) next; } if (defined $nameList{$name}) { # do something (write this info to file2) last; } }

    DWIM is Perl's answer to Gödel
      Thank you both for your response. Graff, your tool is indeed very helpful for a lot of what we do. For this solution though, Grandfather's solution does seem a bit more relevant and as I am somewhat under time constraints to complete this script, it seemed more logical as I can add on to my existing code.
      I've made some slight changes to the code above ...but after spending numerous hours debugging the script, I still can't seem to get it work properly. I maybe missing some logic here...but when I run the script, it seems to output everything into nc_load.txt

      Here's the code:
      #!/usr/bin/perl -w $excludeaccts = "accts_to_exclude.txt"; open(EXCLUDELIST, $excludeaccts) || die ("Cannot open $excludeaccts"); #Load accounts-to-exclude into a hash table %exclusionlist = map {chomp; $_ => undef} <EXCLUDELIST>; close(EXCLUDELIST); $ncaccts = "nc_acct_list.txt"; open(NCLIST, $ncaccts) || die ("Cannot open $ncaccts"); %ncacctlist = map {chomp; $_ => undef} <NCLIST>; close(NCLIST); $ccaccts = "cc_acct_list.txt"; open(CCLIST, $ccaccts) || die ("Cannot open $ccaccts"); %ccacctlist = map {chomp; $_ => undef} <CCLIST>; close(CCLIST); $ccnames = "cc_name_list.txt"; open(CCNALIST, $ccnames) || die ("Cannot open $ccnames"); %ccnamelist = map {chomp; $_ => undef} <CCNALIST>; close(CCNALIST); $origaccts = "orig_accts_list.out"; open(ORIGACCTLIST, $origaccts) || die ("Cannot open $origaccts"); open($output, '>', 'output.txt' || die "Cannot open output file output +.txt"); open($ncoutput, '>','nc_load.txt' || "Cannot open output file nc_load. +txt"); open($ccoutput, '>', 'cc_load.txt' || "Cannot open output file cc_load +.txt"); while ($line = <ORIGACCTLIST>){ chomp $line; ($filename, $state, $amt, $ttl, $account, $name, $invnum) = split /\t/ +, $line; next if exists $exclusionlist{$account}; print $output "$filename\t$state\t$amt\t$ttl\t$account\t$name\t$invnum +\n"; if (defined $ncacctlist{$account}) { print $ncoutput "$filename\n"; next; } if (defined $ccacctlist{$account}) { print $ccoutput "$filename\n"; next; } if (defined $ccnamelist{$name}) { print $ccoutput "$filename\n"; last; } print $ncoutput "$filename\n"; ###(I added the above line such that anything that's not in $nccctlist +, $ccacctlist, $ccnamelist will be printed to nc_load.txt file If I r +un it without this line, nothing is being printed to nc_load.txt or c +c_load.txt)### }

      Any suggestions as to what I maybe doing wrong here? Thanks.

        I see you are using if (defined $xxx{$yyy}) where you probably intended if (exists $xxx{$yyy})

        In general it is useful to provide the fail reason for opens using $!:

        open(NCLIST, $ncaccts) || die ("Cannot open $ncaccts: $!");

        The line open($output, '>', 'output.txt' || die "Cannot open output file output.txt"); has a missing ) and (. It should be open($output, '>', 'output.txt') || die ("Cannot open output file output.txt: $!");

        You should always use the three parameter open. It makes the input explicit and in other contexts where the file name is provided by the user avoids malicious effects from a user putting > at the start of a file name.


        DWIM is Perl's answer to Gödel
Re: comparing lists
by graff (Chancellor) on Feb 02, 2006 at 06:20 UTC
    I do that sort of thing with various lists and flat-file tables so often that I wrote my own utility to make it easy and flexible; I posted it here -- it's a few years old, but I still use it on almost a daily basis.

    Update: Grandfather's code is of course more relevant for you, and neater, because it does a bunch of files in one swoop. My own tool just compares two lists at a time (like "diff" or "cmp"), but for many cases, you can just chain runs together through a pipe -- e.g. print rows in file1 whose initial field does not match any rows in file2 or file3, and do match rows in file4:

    cmpcol -x1 -l1 file1 file2 | cmpcol -x1 -l1 stdin file3 | cmpcol -i -l +1 stdin file4