I'm guessing that this is still in "test" stages... it does not look like you have 60000 elements in your regex yet. ;)

It's looks like your patterns are supposed to match whole fields -- for example, "01005;11200" should match a line like this:

012345;23456;01005;11200;000111222;111222333
but it should not match a line like this:
012345;23456;02006;22300;000001005;112004444
The code you posted will match both lines, because the regex does not include ";" before and after the long conjunction of field values.

Since you seem to be dealing with flat-table data, and your regex patterns involve matching certain combinations of third and fourth column values on each table row, you should consider treating handling things in a more table-like manner: read the target patterns into a hash, then read each row of the flat table file, pull out the 3rd and 4th fields, and see if they consistute an existing hash key.

In any case, you do want to make sure your script will load your target patterns from a list file, rather than putting all the values in the perl code like you've done here. For example:

use strict; use warnings; ( @ARGV == 2 and -f $ARGV[0] and -f $ARGV[1] ) or die "Usage: $0 input.table target.list\n"; my ( $infile, $targfile ) = @ARGV; my $outfile = "OK.txt"; my $errfile = "ERROR.txt"; open( IN, $infile ) or die "$infile: $!"; open( OUT, $outfile ) or die "$outfile: $!"; open( ERR, $errfile ) or die "$errfile: $!"; open( TARG, $targfile ) or die "$targfile: $!"; my %target; while (<TARG>) { chomp; # target.list has lines like "01005;11400" $target{$_} = undef; } close TARG; while (<IN>) { my @fields = split /;/; # assuming no quoted ";" within fields my $check = join ';', @fields[2,3]; # line-initial value is $field +s[0] # so 3rd and 4th are @fields[ +2,3] if ( exists( $target{$check} )) { print OUT; } else { print ERR; } } close OUT; close ERR;

As for this comment of yours:

i use no struct strict ... etc because others need to change the script easyly and they have totaly no clue of perl
If others, with less knowledge of perl than you have, are going to be altering this script, then that's the most important reason to include  use strict; use warnings; -- that way, when they screw something up, there's a much better chance that the problem will be caught (and explained) before things get worse.

(If these other people are just making adjustments to the list of target patterns, that is another very good reason for keeping that list in a separate file, so it can be updated without having to touch the perl script.)

One last point: if your target patterns are not always being sought in the same columns of the table -- e.g. sometimes your target string is expected to match columns 3 and 4, and other times it is expected to match columns 5 and 6 -- then you might need to revert back to the regex approach. In that case, you should assign the conjunction of strings to a scalar, and form the regex like this:

my @targ_strings = <TARG>; chomp @targ_strings; my $targ_regex = join "|", @targs; while (<IN>) { if ( /;(?:$targ_regex);/ ) { print OUT; } else { print ERR; } }

In reply to Re^2: Filter script with pattern and an array by graff
in thread Filter script with pattern and an array by ultibuzz

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.