The best advice is to use one of the many XML parsing modules.
That said, your approach does work for the sample data you have provided. I can't resist offering you your same approach with a couple of minor improvements in the hope that you'll find them helpful.
print "This will calculate the total amount and To no of transactions
+\n";
my ($cnt, $totalamt);
# precompiled regex
my $PAYMENT_AMOUNT_PATTERN = qr{
<payment \s+ amount> # opening tag
( # begin capture
\d+[.]?\d* # digits with optional decimal point
) # end capture
</payment \s+ amount> # closing tag
}msx;
LINE:
while (<DATA>) {
# capture to lexical variable
my ($payment_amount) = /$PAYMENT_AMOUNT_PATTERN/;
next LINE if not defined $payment_amount;
$cnt++;
$totalamt += $payment_amount;
print;
}
print "The total amount found is $totalamt \n";
print "Total Transactions are $cnt \n";
| [reply] [d/l] |
Thanks for suggestion, its a good learning on usage of regular expression, appreciate your time for this ..
| [reply] |
First off, let me say that although I've got a lot of criticisms in here, it's not bad code for a one-shot script to be used on data in a known format if the script will never be used again. You won't have the memory issues you might if you used an XML parser, since you can parse line by line this way, your speed should be decent, as although you've got an unnecessary second regex, it's only running on a match from the first one, and you have to run at least one regex per line.
That said, you should always have
use strict;
use warnings;
at the top of your code. This would have caught that you never did a my $cnt, which is sloppy, though it doesn't affect the outcome here.
You've also used two regexps where one would do, and I don't think you understand the options you used on them. In particular, you used:
while($_=~m/\<(.+?)\>(.+?)\<\/\1\>/sig)
The 'sig' at the end includes two options that the rest of the line can't possibly make use of: first, 's' is a special handler for when the string might include internal newlines, but that can't happen because you're parsing one line at a time using the default newline splitting, and chomping them off immediately. Second, you've used 'g' to match repeatedly, which would actually be a good idea... except that you're only storing the result of the first match in a scalar, effectively discarding any subsequent 'g' matches.
Next up, you're repeatedly creating and destroying my $value in your while loop for no apparent reason. You could shave a little time off that loop by just using $totalamount += $2 — and keep in mind that loops are the first place you should check for waste, because that's usually where the majority of the code time gets spent.
Next, you're making a few assumptions about your input that may come back to bite you: first, that all your closing tags will have newlines after them, second, that 'payment amount' will always have a value, and third, that even if 'payment amount' has a value, that the value will always be a valid amount. If you are absolutely certain that whatever is generating that XML will always do it in exactly that format, those may not be bad assumptions, and they simplify your code slightly, but it's not that much harder to handle a few deviations without resorting to a full parser. We'll just run with the additional assumption that the XML will at least be well-formed for now.
Finally, your printing of the totals may sometimes look a little strange, since you're working with money, which typically has two spots after the decimal point, so even if the actual numeric value is "123.4", you probably want to print "123.40", and for that you need sprintf.
So here's my version after the readmore, including test data that includes values with garbage, empty values, and with tags all on one line. If you feed that same __DATA__ segment to your original code, you'll see the difference immediately, I think.
| [reply] [d/l] [select] |
Let me say a big thanks for spending so time in looking into my problem and replying in depth.. iam really thankfull.
i have taken all suggestions and good learning for me here
| [reply] |
Others have given you advice how to improve your code if you do it the regex way, here's how you could use existing XML processing tools if your input were valid XML (which it isn't in your example).
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
print "This will calculate the total amount and To no of transactions
+\n";
my $totalamt = 0;
my $cnt = 0;
my $twig = XML::Twig->new( twig_handlers => {
b => \&b_handler,
payment_amount => \&amt_handler
});
$twig->parse(do {local $/; <DATA>});
print "The total amount found is $totalamt \n";
print "Total Transactions are $cnt \n";
$twig->purge();
sub b_handler {
my ($t, $b_elem) = @_;
$cnt++;
$b_elem->delete();
}
sub amt_handler {
my ($t, $amt_elem) = @_;
$totalamt += int($amt_elem->text());
}
__DATA__
<a>
<b>
<client_account_cred>68789790390909090489</client_acco
+unt_cred>
<Payment_UTR_No>MTRIN10909890896</Payment_UTR_No>
<payment_amount>700000</payment_amount>
</b>
<b>
<client_account_cred>9033753053985392INR</client_accou
+nt_cred>
<Payment_UTR_No>938573895735154</Payment_UTR_No>
<payment_amount>1222706</payment_amount>
</b>
<b>
<client_account_cred>9284723472047222INR</client_accou
+nt_cred>
<Payment_UTR_No>RP JLLKL7687</Payment_UTR_No>
<payment_amount>1437865.95</payment_amount>
</b>
</a>
| [reply] [d/l] |
pjotrik is right, the example was not XML. If it were, this is a solution using XML::Rules (AZed, neither this solution nor XML::Twig has any memory problems whatsoever with this. Not all XML parsers construct a maze of objects before you can get your hands on the data.)
#!/usr/bin/perl
use strict;
use warnings;
use XML::Rules;
print "This will calculate the total amount and To no of transactions
+\n";
my $totalamt = 0;
my $cnt = 0;
my $parser = XML::Rules->new( stripspaces => 7,
rules => {
_default => '',
b => sub {++$cnt;return},
payment_amount => sub {$totalamt+=$_[1]->{_content};return},
});
$parser->parse(\*DATA);
print "The total amount found is $totalamt \n";
print "Total Transactions are $cnt \n";
__DATA__
<a>
<b>
<client_account_cred>68789790390909090489</client_account_
+cred>
<Payment_UTR_No>MTRIN10909890896</Payment_UTR_No>
<payment_amount>700000</payment_amount>
</b>
<b>
<client_account_cred>9033753053985392INR</client_account_c
+red>
<Payment_UTR_No>938573895735154</Payment_UTR_No>
<payment_amount>1222706</payment_amount>
</b>
<b>
<client_account_cred>9284723472047222INR</client_account_c
+red>
<Payment_UTR_No>RP JLLKL7687</Payment_UTR_No>
<payment_amount>1437865.95</payment_amount>
</b>
</a>
or if you do not want to use file scoped lexicals (and want the parser to be able to give you the right answer even if you call it several times):
#!/usr/bin/perl
use strict;
use warnings;
use XML::Rules;
print "This will calculate the total amount and To no of transactions
+\n";
my $parser = XML::Rules->new( stripspaces => 7,
start_rules => {
a => sub {
$_[4]->{pad}{count}=0;
$_[4]->{pad}{total}=0;
1;
}
},
rules => {
_default => '',
b => sub {++ $_[4]->{pad}{count}; return},
payment_amount => sub {$_[4]->{pad}{total} += $_[1]->{_content
+};return},
a => sub {return %{$_[4]->{pad}}}
});
my $data = $parser->parse(\*DATA);
print "The total amount found is $data->{total}\n";
print "Total Transactions are $data->{count}\n";
__DATA__
<a>
<b>
...
| [reply] [d/l] [select] |