in reply to Re: Filter script with pattern and an array
in thread Filter script with pattern and an array
It's looks like your patterns are supposed to match whole fields -- for example, "01005;11200" should match a line like this:
but it should not match a line like this:012345;23456;01005;11200;000111222;111222333
The code you posted will match both lines, because the regex does not include ";" before and after the long conjunction of field values.012345;23456;02006;22300;000001005;112004444
Since you seem to be dealing with flat-table data, and your regex patterns involve matching certain combinations of third and fourth column values on each table row, you should consider treating handling things in a more table-like manner: read the target patterns into a hash, then read each row of the flat table file, pull out the 3rd and 4th fields, and see if they consistute an existing hash key.
In any case, you do want to make sure your script will load your target patterns from a list file, rather than putting all the values in the perl code like you've done here. For example:
use strict; use warnings; ( @ARGV == 2 and -f $ARGV[0] and -f $ARGV[1] ) or die "Usage: $0 input.table target.list\n"; my ( $infile, $targfile ) = @ARGV; my $outfile = "OK.txt"; my $errfile = "ERROR.txt"; open( IN, $infile ) or die "$infile: $!"; open( OUT, $outfile ) or die "$outfile: $!"; open( ERR, $errfile ) or die "$errfile: $!"; open( TARG, $targfile ) or die "$targfile: $!"; my %target; while (<TARG>) { chomp; # target.list has lines like "01005;11400" $target{$_} = undef; } close TARG; while (<IN>) { my @fields = split /;/; # assuming no quoted ";" within fields my $check = join ';', @fields[2,3]; # line-initial value is $field +s[0] # so 3rd and 4th are @fields[ +2,3] if ( exists( $target{$check} )) { print OUT; } else { print ERR; } } close OUT; close ERR;
As for this comment of yours:
i use noIf others, with less knowledge of perl than you have, are going to be altering this script, then that's the most important reason to include use strict; use warnings; -- that way, when they screw something up, there's a much better chance that the problem will be caught (and explained) before things get worse.structstrict ... etc because others need to change the script easyly and they have totaly no clue of perl
(If these other people are just making adjustments to the list of target patterns, that is another very good reason for keeping that list in a separate file, so it can be updated without having to touch the perl script.)
One last point: if your target patterns are not always being sought in the same columns of the table -- e.g. sometimes your target string is expected to match columns 3 and 4, and other times it is expected to match columns 5 and 6 -- then you might need to revert back to the regex approach. In that case, you should assign the conjunction of strings to a scalar, and form the regex like this:
my @targ_strings = <TARG>; chomp @targ_strings; my $targ_regex = join "|", @targs; while (<IN>) { if ( /;(?:$targ_regex);/ ) { print OUT; } else { print ERR; } }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: Filter script with pattern and an array
by ultibuzz (Monk) on Oct 20, 2005 at 15:16 UTC |