First off, let me say that although I've got a lot of criticisms in here, it's not bad code for a one-shot script to be used on data in a known format if the script will never be used again. You won't have the memory issues you might if you used an XML parser, since you can parse line by line this way, your speed should be decent, as although you've got an unnecessary second regex, it's only running on a match from the first one, and you have to run at least one regex per line.

That said, you should always have

use strict; use warnings;

at the top of your code. This would have caught that you never did a my $cnt, which is sloppy, though it doesn't affect the outcome here.

You've also used two regexps where one would do, and I don't think you understand the options you used on them. In particular, you used:

while($_=~m/\<(.+?)\>(.+?)\<\/\1\>/sig)

The 'sig' at the end includes two options that the rest of the line can't possibly make use of: first, 's' is a special handler for when the string might include internal newlines, but that can't happen because you're parsing one line at a time using the default newline splitting, and chomping them off immediately. Second, you've used 'g' to match repeatedly, which would actually be a good idea... except that you're only storing the result of the first match in a scalar, effectively discarding any subsequent 'g' matches.

Next up, you're repeatedly creating and destroying my $value in your while loop for no apparent reason. You could shave a little time off that loop by just using $totalamount += $2 — and keep in mind that loops are the first place you should check for waste, because that's usually where the majority of the code time gets spent.

Next, you're making a few assumptions about your input that may come back to bite you: first, that all your closing tags will have newlines after them, second, that 'payment amount' will always have a value, and third, that even if 'payment amount' has a value, that the value will always be a valid amount. If you are absolutely certain that whatever is generating that XML will always do it in exactly that format, those may not be bad assumptions, and they simplify your code slightly, but it's not that much harder to handle a few deviations without resorting to a full parser. We'll just run with the additional assumption that the XML will at least be well-formed for now.

Finally, your printing of the totals may sometimes look a little strange, since you're working with money, which typically has two spots after the decimal point, so even if the actual numeric value is "123.4", you probably want to print "123.40", and for that you need sprintf. So here's my version after the readmore, including test data that includes values with garbage, empty values, and with tags all on one line. If you feed that same __DATA__ segment to your original code, you'll see the difference immediately, I think.

#!/usr/bin/perl use strict; use warnings; print "This will calculate the total amount and number of transactions +.\n"; my $totalamt = 0; my $cnt = 0; my @amounts = (); while(<DATA>) { chomp; @amounts = /\<payment amount\>(\d+(?:.\d{2})?)\<\/payment amount\ +>/sig; foreach my $amt (@amounts) { $cnt++; $totalamt += $amt; } } print sprintf("The total amount found is \$%.2f\n",$totalamt); print "Total no. of transactions: $cnt\n"; __DATA__ <a> <b> <client account cred>68789790390909090489</client acco +unt cred> <Payment UTR No>MTRIN10909890896</Payment UTR No> <payment amount>700000</payment amount> </b> <b> <client account cred>9033753053985392INR</client accou +nt cred> <Payment UTR No>938573895735154</Payment UTR No> <payment amount>1222706</payment amount> </b> <b> <client account cred>9284723472047222INR</client accou +nt cred> <Payment UTR No>RP JLLKL7687</Payment UTR No> <payment amount></payment amount> </b> <b> <client account cred>9284723472047222INR</client accou +nt cred> <Payment UTR No>RP JLLKL7687</Payment UTR No> <payment amount>1437865.95</payment amount> </b> <b> <client account cred>9284723472047222INR</client accou +nt cred> <Payment UTR No>RP JLLKL7687</Payment UTR No> <payment amount>this is garbage</payment amount> </b> <b><client account cred>9033753053985392INR</client account cr +ed><Payment UTR No>938573895735154</Payment UTR No><payment amount>12 +22706</payment amount></b><b><client account cred>9284723472047222INR +</client account cred><Payment UTR No>RP JLLKL7687</Payment UTR No><p +ayment amount>1437865.95</payment amount></b> </a>

In reply to Re: Counting XML Blocks in XML file and Calculating Total Amount by AZed
in thread Counting XML Blocks in XML file and Calculating Total Amount by harishnuti

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.