Re^2: problem with foreach statement

sugestions of how i can do better are welcome

OK, here goes...

Try using perltidy on this script, and then try writing your code according to that general style as a habit. When I did "perltidy 520234.pl -o 520234_tidy.pl", the output file was 252 lines instead of the original 99, but on the whole it was a lot more legible.
You seem to be doing a lot of "copy/paste/modify" style coding -- lots of little blocks of lines that are all very similar to each other with just a few regular differences among the various copies. For each set of repeated code blocks, you would probably be better off using a subroutine or an outer loop over the distinct parameter values. Not only would this reduce the line count by 50% or more, but it would make the script easier to maintain and upgrade.
Among the parameters that vary across your repeated code blocks are array names like "@newexout79", file handle names like "OUTMAIL79", etc. These cry out for a hash data structure, which would make it a lot easier to reduce the code into a single loop over a list (i.e. the elements of the list can be the keys of the hash data structure).
Your first set of nested "foreach" loops is doing a lot of extra work at the innermost layer, which could be done once before starting the outermost loop. Also, it looks like you could be pushing multiple copies of a single line from some data file onto the "@matches" array, if the line happens to match multiple elements of the "@matches_in" array. (This probably isn't a problem, but there's no way to know without seeing the data.)
(nitpicks) You use "close" where you should use "closedir"; also, you assume (implicitly) that "$path" (derived from $0) is the current working directory (CWD), but you might want to use "chdir" to make that explicit.

Having said that, here is a "tidy" version of your code, which uses a single HoH and a single set of related arrays and file handles (instead of five distinct sets of everything), and applies most of the other suggestions listed above -- it's about 120 lines, down from the 252 of the original "tidy" version. I haven't tested it, of course, but it does pass "perl -cw", and I think it'll do the same thing as your original code. (I could be wrong about that, but this should at least give you an idea about how you can eliminate unnecessary duplication in your code.)

#!/usr/bin/perl -w

use strict;
use File::Spec;

my $spec_text = qr/Smt22/;
my $path_tmp  = $0;
my ( $volume, $directories, $file ) = File::Spec->splitpath($path_tmp)
+;
my $path = $volume . $directories;
opendir( DIRREN, $path ) || die;
my @RENDIR = readdir(DIRREN);
closedir( DIRREN );
my $match_file = "MSISDN.txt";
my @matches = ();

my @errorcodes = qw/22 24 50 52 79/;

# open file and load into array
chdir $path;
open( my $match_in, '<', $match_file ) or die("$match_file: open faile
+d: $!");
my @matches_in = <$match_in>;
close($match_in);

for ( @matches_in ) {
    $_ = (split /,/)[1] . '.*Returned error code (?:' . join('|', @err
+orcodes) . ')';
}

foreach my $file (@RENDIR) {
    if ( $file =~ $spec_text ) {
        open( my $fh_in, '<', $file ) or die("$file: open failed: $!")
+;
        while ( my $line = <$fh_in> ) {
            foreach my $match_in (@matches_in) {
                if ( $line =~ /$match_in/ ) {
                    push @matches, $line;
                    last;
                }
            }
        }
        close($fh_in);
    }
}
my @newmatch = ();
my @tmplist  = ();

#split logfile line into needs and put together everything ;D
foreach my $match (@matches) {
    chomp $match;
    @newmatch = split( ';', $match );
    my @tmpnewmatch = split( ',', $newmatch[6] );

    #cuts the blanks @ beginning and end of each string element
    $newmatch[5]    =~ s/^\s+//;
    $tmpnewmatch[0] =~ s/^\s+//;
    push @tmplist, $newmatch[5] . ";" . $tmpnewmatch[0];
}

# removes duplicates and print out MD Del list
my @list = do { my %seen; grep !$seen{$_}++, @tmplist };

open( INFILE, '<', $match_file ) or die("$match_file: open failed: $!"
+);
open( MDDELOUT, '>', "MD_delete_list.txt" ) or die("MD_delete_list.txt
+: open failed: $!");
while (<INFILE>) {
    my @mdmsisdn = split( ',', $_ );
    $mdmsisdn[1] =~ s/^\s+//;
    foreach my $mddel (@list) {
        my @newmddel = split( ';', $mddel );
        if ( $newmddel[0] == $mdmsisdn[1] ) {
            print MDDELOUT "$mdmsisdn[0] ; $mddel \n";
        }
    }
}
close(INFILE);
close(MDDELOUT);

#put the list into an array to easyly search for msisdn, will be put i
+nto a hash of hashes
# top-level hash keyed by errorcode, lower-level hash keyed by msisdn 
+value

my %adv = ();
my $errcode_regex = 'ADV: Returned error code (' . join('|',@errorcode
+s) . ')';
foreach my $list (@list) {
    my ( $msisdn, $adverror ) = split( ';', $list );
    if ( $adverror =~ /$errcode_regex/ ) {
        my $errcode = $1;
        $adv{$errcode}{$msisdn} = $adverror;
    }
    else { print "no ADV Errors ;D" }
}

#open errorfiles and check if msisdn is already known then sorting to 
+existing or not
for my $errcode ( @errorcodes ) {
    my $advname = "adv$errcode.txt";
    open( my $exmsisdn, '<', $advname ) or die("$advname: open failed:
+ $!");
    my @exerrors = <$exmsisdn>;
    close($exmsisdn);
    my @tmpnew = ();
    my @tmpold = ();
    foreach my $advkey ( keys %{$adv{$errcode}} ) {
        if ( grep { $advkey == $_ } @exerrors ) {
            print "$advkey is an old error\n";
            push @tmpold, $advkey . "\n";
        }
        else {
            print "$advkey is a new error\n";
            push @tmpnew, $advkey . "\n";
        }
    }

#put new errors in arry to existing errors dupe check it and write bac
+k to file
    foreach my $tmpexout (@tmpnew) { push @exerrors, $tmpexout; }
    my @newexout = do { my %seen; grep !$seen{$_}++, @exerrors };

    open( OUT, '>', $advname ) or die("$advname: open failed: $!");
    foreach my $newout (@newexout) { print OUT $newout; }

#printing out final records to mail ;D
    my $mailname = join '', 'adv', $errcode, '_mail.txt';
    open( OUTMAIL, '>', $mailname ) or die("$mailname: open failed: $!
+");
    print OUTMAIL "Here goes the text needed befor MSISDN \n NEW MSISD
+Ns\n\n";
    foreach my $mail (@tmpnew) { print OUTMAIL "$mail"; }
    print OUTMAIL "\n EXISTING MSISDNs\n";
    foreach my $mailold (@tmpold) { print OUTMAIL "$mailold"; }
    close(OUTMAIL);
}
[download]

Comment on Re^2: problem with foreach statement Download Code

Replies are listed 'Best First'.
Re^3: problem with foreach statement by ultibuzz (Monk) on Jan 26, 2006 at 12:32 UTC
your HoH gode is really really nice i tested it and its around 2-6 minuts faster. to give some suroundings: the script needs to filter several times a day several logs this logs are quit big between 1.000.000 - 2.600.000 lines. now i have one problem understanding these HoH code 1. i need error 50 and 52 in one file and i have no clue how i can do it with this HoH 2. for each error i need to write a different text 3. i need to write for error 79 all MSISDN in a seperate textfile but again i dont know how to get them out of the HoH help is definatly welcome ;D because i really stuck here	[reply]
Re^4: problem with foreach statement by graff (Chancellor) on Feb 01, 2006 at 06:35 UTC
You can look at the man pages perlreftut, perlref and perldsc for more info on using HoH and similar data structures. As for your specific "changes": 1. i need error 50 and 52 in one file and i have no clue how i can do it with this HoH 2. for each error i need to write a different text 3. i need to write for error 79 all MSISDN in a seperate textfile but again i dont know how to get them out of the HoH If you have a HoH where the primary key is the error code (22 24 50 52 79), the secondary keys can include not only the "msisdn" values from the log files, but also things like text strings that you want to associate with each type of error, the name of the file where you want to write information about each type, etc. Starting from my version of the code posted above, you could skip the use of the "@errorcodes" array, and instead initialize the "%adv" HoH as follows: `my %adv = ( '22' => { filename => "adv22.txt", message => "Text for error code 22" }, '24' => { filename => "adv24.txt", message => "This about error 24" }, '50' => { filename => "adv50_and_52.txt", message => "Text for error 50" }, '52' => { filename => "adv50_and_52.txt", message => "This is about error 52" }, '79' => { filename => "adv79.txt", message => "Special text for code 79" }, }; }` [download] Then, when you get to the part near the bottom of the script where you loop over the various error codes to handle the "adv*.txt" files and send out mail messages, you can look up the filename and the message text for each error code in "$adv{$errcode}{filename}" and "$adv{$errcode}{message}". Meanwhile, the "$msisdn" strings that are also being used as secondary hash keys in %adv will continue to work (so long as these strings never turn out to be "filename" or "message", which would overwrite the initial values assigned above).	[reply] [d/l]
Re^5: problem with foreach statement by ultibuzz (Monk) on Feb 16, 2006 at 07:33 UTC
thx man. this explain alot great help	[reply]
Re^3: problem with foreach statement by ultibuzz (Monk) on Jan 03, 2006 at 19:01 UTC
hi, my code in use is much more formated than the posted one ,D but your code definatly rocks thx for the advises and inspirience. after i was finished with the code and tested it ,then publish here, i began writing a module / class for it because i defenatly will need this type of process a few more times in other progrmas that will handel mostly the same data ;D im quit new to perl so it takes me some more time to get into this hash in a hash and referencing it puting stuff in subs and so on ;D , but im working on it to get better ;D ,	[reply]