shawshankred has asked for the wisdom of the Perl Monks concerning the following question:

I have a list of numbers in a file (5000 numbers)

2345678923
2121212121
4424352424
2323232323


I need to search the list of these numbers in a list of log files under a directory (200 log files). These log files are transaction log files with BEGIN and END fields and if the number is found in a transaction then print the whole transaction.

file1, file2 ....file200.

I need to print the whole tramsaction that has the matching number in the list, the filename its under.

I have a script that is very efficient to search for a single number. But to search multiple numbers its taking too long.

Here is the code. Any suggestion is appreciated.
my $flag = 0; my $fg =0; my $count =0; $a1 = '154216722'; while ($line = <>) { if ($line =~ /\<BEGIN Transaction\>/){ $flag =1; } if ($flag == 1){ push(@temp, $line); } if ($line =~ /\<\/END Transaction\>/){ $flag =0; } if ($flag == 0){ foreach $tmp (@temp){ if ($tmp =~ /$a1/) { $fg = 1; $count++; } } if ($fg ==1) { print @temp; $fg =0; } $flag =5; $#temp = -1; } }

Replies are listed 'Best First'.
Re: Match a list of numbers from a list of files
by johngg (Canon) on Dec 18, 2008 at 00:12 UTC

    Firstly, you can read entire an entire transaction by setting the input record separator ($/, see perlvar). Second, you can use a regular expression with alternation to see if a transaction contains any of the numbers in one fell swoop.

    use strict; use warnings; open my $numberFH, q{<}, \ <<EOD or die qq{open: $!\n}; 123456789 567898760 154216722 763498126 EOD chomp( my @numbers = <$numberFH> ); close $numberFH or die qq{close: $!\n}; my $rxFindTrans = do{ local $" = q{|}; qr{(@numbers)}; }; open my $transFH, q{<}, \ <<EOD or die qq{open: $!\n}; some rubbish lines <BEGIN Transaction> blurfl 154876543 <END Transaction> more rubbish <BEGIN Transaction> the one we want 154216722 with more stuff <END Transaction> <BEGIN Transaction> blargh 54211548 <END Transaction> EOD { local $/ = qq{<END Transaction>\n}; while( <$transFH> ) { s{.*(?=<BEGIN Transaction>)}{}s; next unless m{$rxFindTrans}; print qq{Found $1 in:\n$_}; print qq{==================\n}; } } close $transFH or die qq{close: $!\n};

    The output.

    Found 154216722 in: <BEGIN Transaction> the one we want 154216722 with more stuff <END Transaction> ==================

    I hope this is helpful.

    Cheers,

    JohnGG

      Thanks a lot JohnGG, I will try this out and let you know. Appreciate your help.
      I am not sure I am loading the numberFH correctly. I need to get that from a file and also the transaction files. Here is my code, not sure if this is right.
      open IN1,"cat $DATA_DIR/$INFILE_NAME |" or die "Can't open $INFILE_NAM +E: $!\n"; while ($line1 = <IN1>) { chomp(my @numbers = $line1); } close(IN1); my $rxFindTrans = do{ local $" = q{|}; qr{(@numbers)}; }; open IN2,"cat $TransLogs/$FILE_NAME |" or die "Can't open $FILE_NAME: +$!\n"; local $/ = qq{</DU>\n}; while( <IN2> ) { s{.*(?=<DU>)}{}s; next unless m{$rxFindTrans}; print qq{Found $1 in:\n$_}; print qq{==================\n}; } close(IN2);

        There's no need to pipe cat into your filehandles as you can open files directly; the three-argument form with lexical filehandles is recommended practice.

        Instead of

        open IN1,"cat $DATA_DIR/$INFILE_NAME |" or die "Can't open $INFILE_NAM +E: $!\n";

        do

        open my $in1FH, q{<}, $DATA_DIR/$INFILE_NAME or die qq{Can't open $INFILE_NAME: $!\n};

        As you suspected, you are not reading the numbers file correctly. From your original post it looks like you have 5000 or so numbers in a file, one per line. If you assign the readline into an array rather than a scalar then the whole file is read into the array, one line per element. Furthermore, chomping an array will remove the line terminator from every element in the array. You could read the file line by line in a while loop instead if you like but then you would have to push each line onto the array. These two bits of code are equivalent.

        Using a loop

        my @numbers = (); while( <$in1FH> ) { chomp; push @numbers, $_; }

        Reading directly into an array.

        chomp( my @numbers = <$in1FH> );

        You have removed the bare code block around the reading of the second file. It was there so that the local $/ ... really was localised to that scope to avoid possible side effects later in your script. Since you have a lot of files to read you could perhaps do something like

        my @filesToRead = ( populate this list somehow ); ... foreach my $file ( @filesToRead ) { open my $in2FH, q{<}, $file or die qq{Can't open $file: $!\n}; local $/ = qq{</DU>\n}; while( <$in2FH> ) { ... } close $in2FH or die qq{Can't close $file: $!\n}; }

        I hope this is helpful.

        Cheers,

        JohnGG

Re: Match a list of numbers from a list of files
by GrandFather (Saint) on Dec 18, 2008 at 00:26 UTC

    The following code builds a string containing the current transaction and either prints it or throws it away depending on the state of a keep flag.

    use strict; use warnings; my $numbers = join '|', qw(2345678923 2121212121 4424352424 2323232323 +); my $transaction = ''; my $keep; my $startRe = qr!<BEGIN Transaction>!; my $endRe = qr!</END Transaction>!; my $match = qr!(^|\D)($numbers)($|\D)!; while (defined (my $line = <DATA>)) { my $inTransaction = $line =~ $startRe .. $line =~ $endRe; next unless $inTransaction || $keep; $transaction .= $line; $keep ||= $line =~ /$match/sm; next unless $inTransaction =~ 'E0$'; print $transaction if $keep; $transaction = ''; $keep = undef; } __DATA__ <BEGIN Transaction> 2345678923 </END Transaction> <BEGIN Transaction> 1 </END Transaction> junk <BEGIN Transaction> 12345678923 </END Transaction> <BEGIN Transaction> 2121212121 </END Transaction>

    Prints:

    <BEGIN Transaction> 2345678923 </END Transaction> <BEGIN Transaction> 2121212121 </END Transaction>

    Note the use of the flip-flop operator (..) to keep track of inside/outside the transaction and to detect the last line of the transaction.

    Note also the use of qr to precompile the regular expressions that are used for matching transaction start/end and (more importantly) to match the numbers.


    Perl's payment curve coincides with its learning curve.
      Thanks a lot GrandFather..I'll use the code and try it out. Will let you all know. Appreciate your help.
        As per your requirement, will the transaction begin, only after end of the previous transaction? or when one transaction goes on, another transaction can be started? If it is, then you have to handle differently, your solution may not work, else this is fine.
Re: Match a list of numbers from a list of files
by kennethk (Abbot) on Dec 18, 2008 at 00:13 UTC

    You should, as good practice, use strict; and use warnings;, along with explicit initializing variables - it's obviously not necessary but can save headaches. There are a number of so-called best practices you aren't following, but they are less essential (4-space indent, clear names for variables, ...). Is there a reason you hold off on parsing your @temp array 'till you've read in an entire block? From what I read, it seems like you could do the same thing with:

    use strict; use warnings; my $transaction_flag = 0; my $match_flag = 0; my $count = 0; my @value_list = ('154216722'); my @transaction = (); while (my $line = <>) { if ($line =~ /\<BEGIN Transaction\>/){ $transaction_flag = 1; } if ($transaction_flag) { push @transaction, $line; for (@value_list) { if ($line =~ /$_/) { $match_flag = 1; $count++; } } } if ($line =~ /\<\/END Transaction\>/){ $transaction_flag = 0; if ($match_flag) { print @transaction; } $match_flag = 0; @transaction = (); } }
    This also compares against an entire list of match values at once, which is likely desirable if you want to test against 5000 numbers.
      Thanks a lot man..I'll use the code and try it out. Will let you all know. Appreciate your help.
Re: Match a list of numbers from a list of files
by state-o-dis-array (Hermit) on Dec 17, 2008 at 23:28 UTC
    One thing that jumps out at me is that you are essentially traversing the file twice. You could check $line for a match at the same time you are push-ing it to @temp instead of creating @temp then checking each @temp. If you have a match, set a flag that tells you that something else needs to be done with @temp when you reach END.
      Thanks a lot man..I'll use the suggestion and try it out. Appreciate your help.
Re: Match a list of numbers from a list of files
by hbm (Hermit) on Dec 18, 2008 at 23:01 UTC
    My style with the '..' operator:
    use strict; use warnings; my %numbers = ( '2345678923' => 1, '2121212121' => 1, '4424352424' => 1, '2323232323' => 1 ); while (<DATA>) { chomp; if (/<BEGIN Transaction/ .. /<END Transaction>/) { if (/(?:^|\D)(\d+)(?:\D|$)/ && exists $numbers{$_}) { print "<BEGIN Transaction>$_<END Transaction>\n"; } } } __DATA__ <BEGIN Transaction> 2345678923 </END Transaction> <BEGIN Transaction> 1 </END Transaction> junk <BEGIN Transaction> 12345678923 </END Transaction> <BEGIN Transaction> 2121212121 </END Transaction>