Counting XML Blocks in XML file and Calculating Total Amount

harishnuti has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.

Re: Counting XML Blocks in XML file and Calculating Total Amount
by Narveson (Chaplain) on Sep 26, 2008 at 04:47 UTC

The best advice is to use one of the many XML parsing modules.

That said, your approach does work for the sample data you have provided. I can't resist offering you your same approach with a couple of minor improvements in the hope that you'll find them helpful.

print "This will calculate the total amount and To no of transactions 
+\n";
my ($cnt, $totalamt);

# precompiled regex
my $PAYMENT_AMOUNT_PATTERN = qr{
    <payment \s+ amount>  # opening tag
    (                     # begin capture
        \d+[.]?\d*        # digits with optional decimal point
    )                     # end capture
    </payment \s+ amount> # closing tag
}msx;

LINE:
while (<DATA>) {
    # capture to lexical variable
    my ($payment_amount) = /$PAYMENT_AMOUNT_PATTERN/;
    next LINE if not defined $payment_amount;
                
    $cnt++;
    $totalamt += $payment_amount;
    print;
}
print "The total amount found is $totalamt \n";
print "Total Transactions are    $cnt      \n";
[download]

[reply]
[d/l]

Re^2: Counting XML Blocks in XML file and Calculating Total Amount

by harishnuti (Beadle) on Sep 26, 2008 at 06:26 UTC

[reply]

Re: Counting XML Blocks in XML file and Calculating Total Amount
by AZed (Monk) on Sep 26, 2008 at 05:33 UTC

First off, let me say that although I've got a lot of criticisms in here, it's not bad code for a one-shot script to be used on data in a known format if the script will never be used again. You won't have the memory issues you might if you used an XML parser, since you can parse line by line this way, your speed should be decent, as although you've got an unnecessary second regex, it's only running on a match from the first one, and you have to run at least one regex per line.

That said, you should always have

use strict;
use warnings;
[download]

at the top of your code. This would have caught that you never did a my $cnt, which is sloppy, though it doesn't affect the outcome here.

You've also used two regexps where one would do, and I don't think you understand the options you used on them. In particular, you used:

while($_=~m/\<(.+?)\>(.+?)\<\/\1\>/sig)
[download]

The 'sig' at the end includes two options that the rest of the line can't possibly make use of: first, 's' is a special handler for when the string might include internal newlines, but that can't happen because you're parsing one line at a time using the default newline splitting, and chomping them off immediately. Second, you've used 'g' to match repeatedly, which would actually be a good idea... except that you're only storing the result of the first match in a scalar, effectively discarding any subsequent 'g' matches.

Next up, you're repeatedly creating and destroying my $value in your while loop for no apparent reason. You could shave a little time off that loop by just using $totalamount += $2 — and keep in mind that loops are the first place you should check for waste, because that's usually where the majority of the code time gets spent.

Next, you're making a few assumptions about your input that may come back to bite you: first, that all your closing tags will have newlines after them, second, that 'payment amount' will always have a value, and third, that even if 'payment amount' has a value, that the value will always be a valid amount. If you are absolutely certain that whatever is generating that XML will always do it in exactly that format, those may not be bad assumptions, and they simplify your code slightly, but it's not that much harder to handle a few deviations without resorting to a full parser. We'll just run with the additional assumption that the XML will at least be well-formed for now.

Finally, your printing of the totals may sometimes look a little strange, since you're working with money, which typically has two spots after the decimal point, so even if the actual numeric value is "123.4", you probably want to print "123.40", and for that you need sprintf. So here's my version after the readmore, including test data that includes values with garbage, empty values, and with tags all on one line. If you feed that same __DATA__ segment to your original code, you'll see the difference immediately, I think.

Read more... (3 kB)

[reply]
[d/l]
[select]

Re^2: Counting XML Blocks in XML file and Calculating Total Amount

by harishnuti (Beadle) on Sep 26, 2008 at 06:31 UTC

[reply]

Re: Counting XML Blocks in XML file and Calculating Total Amount
by pjotrik (Friar) on Sep 26, 2008 at 12:31 UTC

#!/usr/bin/perl

use strict;
use warnings;

use XML::Twig;

print "This will calculate the total amount and To no of transactions 
+\n";

my $totalamt = 0;
my $cnt = 0;

my $twig = XML::Twig->new( twig_handlers => {
                b => \&b_handler,
                payment_amount => \&amt_handler
        });
$twig->parse(do {local $/; <DATA>});

print "The total amount found is $totalamt \n";
print "Total Transactions are    $cnt      \n";

$twig->purge();

sub b_handler {
        my ($t, $b_elem) = @_;
        $cnt++;
        $b_elem->delete();
}

sub amt_handler {
        my ($t, $amt_elem) = @_;
        $totalamt += int($amt_elem->text());
}

__DATA__
<a>
        <b>
                <client_account_cred>68789790390909090489</client_acco
+unt_cred>
                <Payment_UTR_No>MTRIN10909890896</Payment_UTR_No>
                <payment_amount>700000</payment_amount>
        </b>
        <b>
                <client_account_cred>9033753053985392INR</client_accou
+nt_cred>
                <Payment_UTR_No>938573895735154</Payment_UTR_No>
                <payment_amount>1222706</payment_amount>
        </b>
        <b>
                <client_account_cred>9284723472047222INR</client_accou
+nt_cred>
                <Payment_UTR_No>RP JLLKL7687</Payment_UTR_No>
                <payment_amount>1437865.95</payment_amount>
        </b>
</a>
[download]

[reply]
[d/l]

Re: Counting XML Blocks in XML file and Calculating Total Amount
by Jenda (Abbot) on Sep 26, 2008 at 19:59 UTC

pjotrik is right, the example was not XML. If it were, this is a solution using XML::Rules (AZed, neither this solution nor XML::Twig has any memory problems whatsoever with this. Not all XML parsers construct a maze of objects before you can get your hands on the data.)

#!/usr/bin/perl
use strict;
use warnings;

use XML::Rules;

print "This will calculate the total amount and To no of transactions 
+\n";

my $totalamt = 0;
my $cnt = 0;

my $parser = XML::Rules->new( stripspaces => 7,
    rules => {
        _default => '',
        b => sub {++$cnt;return},
        payment_amount => sub {$totalamt+=$_[1]->{_content};return},
    });
$parser->parse(\*DATA);

print "The total amount found is $totalamt \n";
print "Total Transactions are    $cnt      \n";

__DATA__
<a>
    <b>
            <client_account_cred>68789790390909090489</client_account_
+cred>
            <Payment_UTR_No>MTRIN10909890896</Payment_UTR_No>
            <payment_amount>700000</payment_amount>
    </b>
    <b>
            <client_account_cred>9033753053985392INR</client_account_c
+red>
            <Payment_UTR_No>938573895735154</Payment_UTR_No>
            <payment_amount>1222706</payment_amount>
    </b>
    <b>
            <client_account_cred>9284723472047222INR</client_account_c
+red>
            <Payment_UTR_No>RP JLLKL7687</Payment_UTR_No>
            <payment_amount>1437865.95</payment_amount>
    </b>
</a>
[download]

#!/usr/bin/perl
use strict;
use warnings;

use XML::Rules;

print "This will calculate the total amount and To no of transactions 
+\n";


my $parser = XML::Rules->new( stripspaces => 7,
    start_rules => {
        a => sub {
            $_[4]->{pad}{count}=0;
            $_[4]->{pad}{total}=0;
            1;
        }
    },
    rules => {
        _default => '',
        b => sub {++ $_[4]->{pad}{count}; return},
        payment_amount => sub {$_[4]->{pad}{total} += $_[1]->{_content
+};return},
        a => sub {return %{$_[4]->{pad}}}
    });
my $data = $parser->parse(\*DATA);

print "The total amount found is $data->{total}\n";
print "Total Transactions are    $data->{count}\n";

__DATA__
<a>
    <b>
...
[download]

Jenda
Support Denmark!
Defend the free world!

[reply]
[d/l]
[select]