jspindler has asked for the wisdom of the Perl Monks concerning the following question:

ok I wrote a perl script that parses our credit card files. In a nutshell it:
1. Reads a directory listing and pushing all files into the array @dirlisting
2. go through foreach(@dirlisting) opens the file and substr it getting the data I want out of it and writing it to a new file. Removing any dupes if the dupes are in a certain field:
$lines{$ccn}++; if($lines{$ccn} == 2){ push @dupes, $ccn;

3. Then we take out dupes array and write that to a dupes file and if they're not dupes:
 if (!$dupez{$ccn}){
then we write to a newmaster.txt file.

Now everything seems to work ok but it was just reported to me that one of the credit card numbers in the original file was reversed! ie.

Original File:
firstnumber
secondnumber

After PerlScript:
secondnumber
firstnumber


Now the reason this is a problem is cause we substr the entire line so the other fields that corresponde with those numbers were not switch! so I have mismatched number. Any ideas as to why this might happen?
Thanks for any help

Edit kudra, 2002-02-23 Added description to title

Replies are listed 'Best First'.
Re: funny results
by YuckFoo (Abbot) on Feb 20, 2002 at 22:45 UTC
    The code you have shared is too disjointed. You push @dupes, then @dupes is never used again. You reference %dupes, but where did it come from? Most important would be seeing the code that generates the output.

    YuckFoo

      Ok here we go: I don't know where the +'s are coming from but they are not in the code...the red pluses that is...
      #!/usr/local/bin/perl #+---------------------------------------------+ #| Configuration Options | #+---------------------------------------------+ $dirname = "C:\\Working\\"; $mastertemp = "C:\\master.txt"; $dupefile = "C:\\dupes.txt"; $newmaster = "C:\\newmaster.txt"; main(); sub main{ dirlist(); extract_data(); output_results(); } sub dirlist{ + # Pull Directory Listing opendir(DIR, $dirname) or die "can't opendir $dirname: $!"; + # open the directory while (defined($file = readdir(DIR))) { + # read the directory next if $file =~ /^\.\.?$/; push(@dirlisting, $file); # + put dirlist in @dirlisting } } sub extract_data { + # Get list of archived files %lines; %dupez; foreach (@dirlisting){ + # $filename1 = $_; open(INPUTFILE, "$dirname" . "$filename1"); open(MASTERTEMP, ">>$mastertemp"); open(DUPES, ">>$dupefile"); while (<INPUTFILE>) { $custID = substr($_,0,9); $ccn = substr($_,22,16); $expy = substr($_,44,4); $type = substr($_,50,3); $amount = substr($_,58,6); # $amount =~ s/^\s+//; if (($custID lt 0) || ($custID eq "Name ")){ }else{ dupeczech(); print MASTERTEMP "$custID"." "."$ccn"." "."$e +xpy"." "."$type"." "."$amount\n"; } } close(INPUTFILE); close(MASTERTEMP); close(DUPES); } } sub dupeczech { $lines{$ccn}++; #add one to the value if it hasn't been def +ined yet it sets it equal to 1 if($lines{$ccn} == 2){ #been seen before so add to dupes list push @dupes, $ccn; # print "$ccn $custID $lines{$ccn}\n"; # print DUPES "$custID"." "."$ccn"." "."$expy\n"; } } sub output_results { open(MASTERTEMP, "$mastertemp"); open(DUPES, ">>$dupefile"); open(NEWMASTER, ">>$newmaster"); while (<MASTERTEMP>) { $custID = substr($_,0,9); $ccn = substr($_,22,16); $expy = substr($_,44,4); $type = substr($_,50,3); $amount = substr($_,58,6); foreach (@dupes){ if ($ccn eq $_) { print DUPES "$custID"." "."$ccn"." "."$ex +py"." "."$type"." "."$amount\n"; $dupez{$ccn}++; } } if (!$dupez{$ccn}){ print NEWMASTER "$custID"." "."$ccn"." "."$ex +py"." "."$type"." "."$amount\n"; } } close(NEWMASTER); close(MASTERTEMP); close(DUPES); }
        jspindler,

        I see that $custID, $ccn, ... $amount are all set at the same time, are not changed, then output. Nothing here would account for fields of different records getting mixed up. I don't think this version of the program is responsible for the mix up.

        A wild guess at the problem: $mastertemp is being opened in append mode. Unless you delete the file before running the program, you are processing results from previous runs of the program. Perhaps an earlier version of the program made the swaps.

        As was suggested, you already have a hash of $cnn you have seen in %lines. Why iterate over @dupes to find duplicates, just check %lines before writing to DUPES.

        Some suggestions about your file opening. You are opening MASTERTEMP and DUPES inside the foreach (@dirlisting) loop, reopening them each time you process a file. Not necessary, open them once outside of the loop and only open in append mode if you really need to, otherwise just open for writing. Also very important is to always, always, check the return of the open().

        Another goal you should work towards is to quit using global variables. Variables needed by functions should be passed to the functions. You will reap huge benefits by doing this in larger programs in you future.

        Again, I don't see the fields getting mixed up in this code. Run the program again starting with a new, empty, MASTERTEMP file and see if the problem persists.

        YuckFoo

        I agree that the error you're seeing shouldn't be caused by this code, and that you should check to make sure that you wipe out your old dupes file before generating new results. Check out the other comments YuckFoo made on how you deal with files.

        One other minor note, on style. Lines like this:

        print NEWMASTER "$custID"." "."$ccn"." "."$expy"." "."$typ +e"." "."$amount\n";
        ...can be more cleanly written like this:
        print NEWMASTER "$custID $ccn $expy $type $amount\n";
        Or even better, you could have used a format.

        buckaduck

Re: funny results
by trs80 (Priest) on Feb 21, 2002 at 00:41 UTC
    Can't you just do:
    # add the $cnn to an array to keep order push @good, $cnn if $lines{$cnn} ne '1'; # add a key to a hash so you don't have to # loop through the array each time $lines{$cnn} = 1;

    Then you don't have any dupes, you just have the cc numbers.
    I am 100% what you asking because it looks like you have much more going on then was presented. If I am way off please help clue me in.
Re: funny results
by jspindler (Initiate) on Feb 21, 2002 at 01:18 UTC
    No I have to have the dupes sent to a file so I can check them manually for errors.
Re: funny results
by YuckFoo (Abbot) on Feb 20, 2002 at 22:46 UTC
    One problem I see is you are pushing $cnn onto array @dupes, then treating it as a hash when checking for duplicates. You should probably $dupes{$cnn}++ instead of push @dupes, $cnn.

    YuckFoo

    Originally posted as a Categorized Answer.

Re: funny results
by YuckFoo (Abbot) on Feb 20, 2002 at 22:47 UTC
    One problem I see is you are pushing $cnn onto array @dupes, then treating it as a hash when checking for duplicates. You should probably $dupes{$cnn}++ instead of push @dupes, $cnn.

    YuckFoo

    Originally posted as a Categorized Answer.