cesear has asked for the wisdom of the Perl Monks concerning the following question:

I need remove lines of file where string 406 or 408 is found in the line based on a key. Example of a line to remove is:

4201034600051 1212104069140 001

The key is 1034600051 the account number. The string to check for is 406 is the rev code.

A sample input file looks like this:

201034600051 1212104069140 001 + 4201034600051 1212104039139 001 + 4201034600153 1212104039140 001 + 4201034000375 1212104034111 001 + 4201034000375 1212104033180 001 + 4201034000375 1212104039443 001 + 4201034000375 1212104039232 001 + 4201034600039 1212104039336 001 + 4201034600045 1212104039252 001

Output codes looks like this

CH 01034600051 1212104069140 001 LAB CH 01034600051 1212104039139 001 LAB CH 01034600153 1212104039140 001 LAB CH 01034000375 1212104034111 001 LAB CH 01034000375 1212104033180 001 LAB CH 01034000375 1212104039443 001 LAB CH 01034000375 1212104039232 001 LAB CH 01034600039 1212104039336 001 LAB CH 01034600045 1212104039252 001 LAB

I don't want the line CH 01034600051 1212104069140 001 LAB to be returned because rev code 406 was found

So, I batch up lines where account is the same (can occur mulitple times) and if the line contains 406 or 408 rev code I need to remove the line and not send to output file. Here is the code I have written so far:
use warnings; use strict; my $file_to_process = $ARGV[0]; my $file_to_create = $ARGV[1]; my $curr_acctn = {}; my $first_time = "yes"; my @account_list = (); open FH1, ">$file_to_create" or die "Cannot create $file_to_create"; open FH, $file_to_process or die "Cannot open $file_to_process"; while (<FH>) { chomp; &rtrim($_); my @fields = split; my $acctn = substr($fields[0],3,10); if($acctn != $curr_acctn) { if($first_time eq "yes") { $first_time = "no"; } else { #print FH1 "**************************\n"; #print FH1 "Charges for account number ----> $curr_acctn:\n"; foreach my $account (@account_list) { my $charge_code = substr($account,20,7); my $chrg_credit = substr($account,0,2); if ($chrg_credit == 42) { substr($account,0,2) = "CH "; substr($account,32,3) = " LAB"; } else { substr($account,0,2) = "CR "; substr($account,32,3) = " LAB"; } #print FH1 "The charge or credit is: $chrg_credit\n"; #print FH1 "The charge code is: $charge_code\n"; print FH1 "$account\n"; } @account_list = (); } } $curr_acctn = $acctn; push (@account_list, $_); } close FH; close FH1; sub rtrim() { $_ =~ s/\s+$//; return; }

I add some other things to the output and remove trailing spaces. I just can not figure out how to get rid of the lines with 406 or 408 rev codes

THANKS

Replies are listed 'Best First'.
Re: Removing element of an array
by kennethk (Abbot) on Apr 14, 2011 at 15:49 UTC
    The easiest was to prevent output of the offending line you be to put a conditional on your output. Perhaps changing print FH1 "$account\n"; to
    if ($account !~ /.{21}40[68]/) { print FH1 "$account\n"; }

    There are a number of stylistic critiques I could give if you like. I would in particular point out you are using prototypes incorrectly - see Prototypes in perlsub.

      Hey thanks that regex works great!
Re: Removing element of an array
by tospo (Hermit) on Apr 14, 2011 at 15:43 UTC
    You can use regular expressions for this. You are using one in your rtrim sub but maybe you haven't explored yet what they can do. For example:
    my $account = '12340612121'; print "it matches 406\n" if $account=~/406/;
    will identify the pattern "406" in the long string and print "it matches 406"
Re: Removing element of an array
by davido (Cardinal) on Apr 14, 2011 at 15:58 UTC

    It seems to me the problem is a little simpler than your code makes it. Wouldn't this do it?

    use strict; use warnings; while( <> ) { chomp; # Test for validity of input, and capture significant digits. # If input seems invalid, die. unless( m/^(?:2|42) # Match but don't capture your prefix. (\d{11})\s # Match and capture left-grouping of digits. (\d{13})\s # Match and capture mid-grouping of digits. (\d3)/x # Match and capture right-grouping of digits. ) { die "Improperly formatted input data:\n\tLine: $.\t$_\n"; } my( @columns ) = ( $1, $2, $3 ); # Retain the captures. # Look for "reject" codes. next if $columns[1] =~ m/^\d{6}(?:406|408)\d{4}\s$/; # Print a nicely formatted result. print "CH $columns[0] $columns[1] $columns[2] LAB\n"; }

    I didn't test the regexps, so you may need to adjust the quantifiers, but they look right to me. From the command line type 'scriptname infile.data >outfile.data', where 'infile.data' is the filename of your input, and 'outfile.data' is the filename of your output. You could deal with opening and closing filehandles in-script, but it's probably not necessary unless you don't want to use redirection at the OS level.


    Dave

Re: Removing element of an array
by merlininthewood (Initiate) on Apr 14, 2011 at 15:59 UTC

    Putting:

     next if /^\d+?\s\d*(?:406|408)/;

    straight after 'while (<FH>) {' should do the trick if i have understood your requirements correctly. If the position of the 406/408 matters you could replace the \d* with \d{6} or whatever. Hope this helps

    Merlin