in reply to Simplifying my script...

A few observations:

Regarding reading that file within the loop each time, do this analysis. Is the main source data (that you are reading in via STDIN small enough to fit in memory? Is the charge database (that you're reading in from the .hdb file small enough to fit into memory? Depending on the answers to those questions, there are four different cases (2x2), and three different implementation strategies:

Both too large to fit in memory, especially concurrently

You have two options. The first is to load this information into a real database, then use the database to do the comparisons. You can use the free databases without too much pain; this particular application is an extremely straightforward join.

The other option is to work on the two files in pieces. Using the dedicated command-line tool sort(1), you could order each file by source number (which is what I think you are using $ano and $anum for). Then you can load just the calls from a particular source as well as just the charge for that source. This should cut back the amount of data that needs to be kept in core.

Charge code list is small (the input list can be large or small

This is the easy case. Just read the charge code list into memory before going through the input list.

Charge code list is large, input list is small

Another fairly easy case, although it requires you to turn the previous solution on its head: read in the input list once at the beginning of processing, then chew through the long input list.

Cleanup

This is assuming that the charge database is small:

#!/usr/bin/perl -w require 5.006; # i'm using "open my $fh ..." construct use strict; # ================================================= # constants # put magic constants at top, so people can easily change them my $CHARGE_CODES_FILE = "mnta2/bmp/table/smsc/chg_indicator.hdb"; my $INPUT_TEMPLATE = "A12 A4 A20 A20 A2 A2 A2 A2 A12 A4 A24 A24"; my $SOURCE_IX = 2; my $DEST_IX = 3; # ================================================= # first, load up the charge database. my %charge_codes; { open my $fh, $CHARGE_CODE_FILE or die "$0: opening '$CHARGE_CODE_FILE': $!"; while ( <$fh> ) { next if /^#/; # skip comments s/\s+$/; # remove end of line chars # get fields from line, clean them up. my ( $src, $dest, $code ) = split /,/; $src =~ s/^0+//; $dest =~ s/^0+//; $charge_codes{$src}{$dest} = $code; } } # ------------------------------------------------- # process standard input, accumulate charges. my %charges_against; while ( my $input = <> ) { # extract source and destination values my @fields = unpack $INPUT_TEMPLATE, $input; my $src = $fields[ $SOURCE_IX ]; my $dest = $fields[ $DEST_IX ]; # remove leading zeros $src =~ s/^0+//; $dest =~ s/^0+//; # see if there are any charge codes associated with this pair. my $code = $charge_codes{$src}{$dest}; if ( defined $code ) { ++$charges_against{$code}; } } # ------------------------------------------------- # output the results foreach my $code ( sort keys %charges_against ) { my $count = $charges_against{$code}; print "Total calls for Billing Code [$code] : $count \n"; } # note that skip_zero is no longer necessary.

I had a lot of extra commented crap in the code to deal with situations that are likely rare but impossible to rule out without knowing the exact data we'd see. I've moved it below this cut to let me keep the code compact enough to display without a cut...

In the loop that reads in the charge codes:

# the original code indicates there can never # be more than one charge code associated with # any given $src and $dest pair; if there are, # we would have to use an array ref here and # a foreach loop in the main processing. # only one charge code: $charge_codes{$src}{$dest} = $code; # # for multiple charge codes: # push @{ $charge_codes{$src}{$dest} }, $code;

Then, in the main processing:

# # if you know that there is no charge code 0 or # # empty string, you could compress that like so: # if ( my $code = $charge_codes{$src}{$dest} ) # { # ++$charges_against{$code}; # } # # if you need multiple codes per src/dest pair: # my $charge_code_aref = $charge_codes{$src}{$dest} # or next; # # increment the number of charges against each code # foreach my $code ( @$charge_code_aref ) # { # ++$charges_against{$code}; # }

And regarding my comment about building that template in a more useful fashion... I had stuff like this in my cleaned-up version of the script, but it got in the way of most of the other points. Here's what I had:

# construct the template in a more self-documenting way. # this is a list of name => width values. you can add # more info here if you like; just adjust the loops # that pull information out of it. having all this # information in one place means that you only have to # change one place... my @INPUT_COLUMN_INFO = ( unknown1 => { width => 12 }, unknown2 => { width => 4 }, source => { width => 20 }, dest => { width => 20 }, ... ); # you can also use this structure to do clever things # such as building a hash out of each input line to # extract values by name instead of by numeric index. # ================================================= # derived constants my @INPUT_NAMES; # names in original order my $INPUT_TEMPLATE; # build up an 'unpack' template my %INPUT_POS; # map from name to index for ( my $i = 0; $i < @INPUT_COLUMN_INFO; $i += 2 ) { my ( $name, $info ) = @INPUT_COLUMN_INFO[$i, $i+1]; push @INPUT_NAMES, $name; $INPUT_TEMPLATE .= "A" . $info->{width}; $INPUT_POS{$name} = @INPUT_NAMES-1; } # ================================================= # use during processing while (<>) { # construct hash using slice my %input; @input{@INPUT_NAMES} = unpack $INPUT_TEMPLATE, $_; # now access info from the hash my $src = $input{source}; my $dest = $input{dest}; # ... }

The exact method you use to extract information from the central structure doesn't matter as much as noticing the advantages you get from having all the format-related stuff in one place (especially when you consider that it might be in a database somewhere...)