bh_perl has asked for the wisdom of the Perl Monks concerning the following question:

Hi,
Actually, my script will be read two standard input and captured anum and bnum values. After that, gets ano, bno and bcode from chg_indicator.hdb files.
If anum startswith ano and bnum startswith bno then return the bcode and do some calculation based on the bcode. Finally, print the calculation report.
But, I think my scipt is not intelligent enough, could somebody give some ideas on that ?
It always reading chg_indicator.hdb files for each anum and bnum entries....
#!/usr/bin/perl -w my $template = "A12 A4 A20 A20 A2 A2 A2 A2 A12 A4 A24 A24"; my %bcodedb = (); while (<>) { my $input = $_; my @a = unpack($template, $input); my $anum = substr($a[2], skip_zero($a[2])); my $bnum = substr($a[3], skip_zero($a[3])); open (F, "cat mnta2/bmp/table/smsc/chg_indicator.hdb |"); while (<F>) { chomp; next if $_ =~ /^#/; my ($ano, $bno, $bcode) = split(/,/); if ($anum =~ /$ano/ && $bnum =~ /$bno/) { $bcode->{ count }++; if ( defined $bcodedb{$bcode} ) { next; } else { $bcodedb{$bcode} = "BCODE"; } last; } } close (F); } while ( my ($l, $k) = sort each(%bcodedb) ) { my $a = $l->{ count } || 0; print "Total calls for Billing Code [$l] : $a \n"; } sub skip_zero { my ($a) = @_; my $j = 0; foreach my $i (split //, $a) { return $j if ($i > 0); $j++; } }

Replies are listed 'Best First'.
Re: Simplifying my script...
by Fletch (Bishop) on Apr 24, 2004 at 21:06 UTC

    Opening a pipe from cat to read a single file is just wrong. Just open the file for reading. perldoc perlopentut, perldoc -f open.

Re: Simplifying my script...
by matija (Priest) on Apr 24, 2004 at 21:40 UTC
    Not only are you reading the file F by piping cat (what on earth posessed you to do that?), but you're reading it once for every line of the input file.

    You should just read it once, do the split, and store each line as a row in an array of arrays. That should speed your script up considerably.

      Not only are you reading the file F by piping cat, but you're reading it once for every line of the input file.

      Wouldn't this be the way to save on memory? If it's a large file then wouldn't you read it line by line? I've been told with smaller files it's ok to read the whole thing and process it but otherwise you want to read it line by line.

      Am I thinking about this from the wrong angle?

        I think you are. If the file is small enough to keep in memory, keep it in memory.

        If it's too big to keep in memory, you should put the data into the database, and mark the appropriate fields as keys. Then the database will know how to skip to exactly the right record much faster than you could do it by reading every record.

        Remember, if the file is big, it takes a long time to read - and you're reading it each time you go through that loop.

Re: Simplifying my script...
by tkil (Monk) on Apr 25, 2004 at 20:53 UTC

    A few observations:

    • Never use variables to name other variables (see perlreftut).
    • Use descriptive variable names. Instead of $ano and $bno (as well as $anum and $bnum!), how about $from and $to? Or $src and $dest?
    • Never use $a or $b by themselves (because use strict won't catch them — they're special-cased for use in sort BLOCK constructs).
    • Take repetitive work out of the loop (as others have already pointed out, you are reading in that bcodedb file for each incoming line — unless you have a good reason to avoid this, such as memory size).
    • Use regexes in preference to manipulating strings character-by-character.
    • Use equality when you mean equality; use regexes only when you really want regex matching.
    • Either use $_ directly, or assign it to a (well-named!) variable at the time you read it in.
    • As others have pointed out, your use of cat is painful and most likely unnecessary (unless you have something very strange going on with permissions or file types or named pipes, etc; but the odds are that you just want to open the file for read directly, not open another process to read that file then read from that process).
    • Consider constructing the template more cleverly (especially if you will ever need any other bits of information out of it).

    Regarding reading that file within the loop each time, do this analysis. Is the main source data (that you are reading in via STDIN small enough to fit in memory? Is the charge database (that you're reading in from the .hdb file small enough to fit into memory? Depending on the answers to those questions, there are four different cases (2x2), and three different implementation strategies:

    Cleanup

    This is assuming that the charge database is small:

    #!/usr/bin/perl -w require 5.006; # i'm using "open my $fh ..." construct use strict; # ================================================= # constants # put magic constants at top, so people can easily change them my $CHARGE_CODES_FILE = "mnta2/bmp/table/smsc/chg_indicator.hdb"; my $INPUT_TEMPLATE = "A12 A4 A20 A20 A2 A2 A2 A2 A12 A4 A24 A24"; my $SOURCE_IX = 2; my $DEST_IX = 3; # ================================================= # first, load up the charge database. my %charge_codes; { open my $fh, $CHARGE_CODE_FILE or die "$0: opening '$CHARGE_CODE_FILE': $!"; while ( <$fh> ) { next if /^#/; # skip comments s/\s+$/; # remove end of line chars # get fields from line, clean them up. my ( $src, $dest, $code ) = split /,/; $src =~ s/^0+//; $dest =~ s/^0+//; $charge_codes{$src}{$dest} = $code; } } # ------------------------------------------------- # process standard input, accumulate charges. my %charges_against; while ( my $input = <> ) { # extract source and destination values my @fields = unpack $INPUT_TEMPLATE, $input; my $src = $fields[ $SOURCE_IX ]; my $dest = $fields[ $DEST_IX ]; # remove leading zeros $src =~ s/^0+//; $dest =~ s/^0+//; # see if there are any charge codes associated with this pair. my $code = $charge_codes{$src}{$dest}; if ( defined $code ) { ++$charges_against{$code}; } } # ------------------------------------------------- # output the results foreach my $code ( sort keys %charges_against ) { my $count = $charges_against{$code}; print "Total calls for Billing Code [$code] : $count \n"; } # note that skip_zero is no longer necessary.

    I had a lot of extra commented crap in the code to deal with situations that are likely rare but impossible to rule out without knowing the exact data we'd see. I've moved it below this cut to let me keep the code compact enough to display without a cut...

    And regarding my comment about building that template in a more useful fashion... I had stuff like this in my cleaned-up version of the script, but it got in the way of most of the other points. Here's what I had: