in reply to XML Tags Stripping & Calculating checksum on it

The main time consumption of your processing is in the file IO, ie. (as you indicate) the line-by-line parsing, plus writing & generating CRC on an external file;

Is this necessary? Couldn't you just read the transactions one-by-one from the file, and run the CRC-calculation on each XML-trans in memory ?? -- As in (untested) :
#!/usr/bin/perl -w use strict; use warnings; use String::CRC32; ### for instance my $XmlData = <<XML_DATA; <Transaction> <MessageCode>100</MessageCode> <ToAccountNo>12989898900</ToAccountNo> </Transaction> <Transaction> <MessageCode>200</MessageCode> <ToAccountNo>24536485582</ToAccountNo> </Transaction> XML_DATA open my $XmlFile, '<', \$XmlData or die "Can't open XML file: $!"; $/ = "</Transaction>"; ### one trans at-a-time while ( <$XmlFile> ) { my $crc = crc32($_); ### & do whatever needs TBD with the $crc ... }

Allan Dystrup

Replies are listed 'Best First'.
Re^2: XML Tags Stripping & Calculating checksum on it
by harishnuti (Beadle) on Jul 16, 2008 at 08:43 UTC

    Thanks, but our requirement is weird, getting all data into one line and calculating checksum, i cant help it, our client uses a java routine in similar fashion to get checksum, so i should make i use of same method to match checksum, indeed we propsed client a filebased checksum instead of string based , with no luck
Re^2: XML Tags Stripping & Calculating checksum on it
by harishnuti (Beadle) on Jul 16, 2008 at 13:29 UTC

    First of all a big thanks for putting effort on my query

    my purpose of calculating checksum in above fashion is to achieve matching of checksum from my client, they do this in above fashion in java.iam trying to improve performance of my script retaining the final checksum to be same.

    if i calculate incremental checksum , how can i get the final checksum which can be matched by our client?
      If you use i.e. the String::CRC32 module, just use the incremental form $crc=crc32($additionalString,$crc) starting with $crc=0;. Finally, $crc is a 32bit unsigned integer value that can be compared. Maybe you need a conversion into a hex-notation or something beforehand (Edit: I mean before comparison, if you have to compare against some string format.): $clear=sprintf("%X",$crc); or $clear=uc(unpack("H8", pack("N",$crc)));
      my $data = do { local( $/ ) ; <$XmlFile> } ; # Slurp file $data =~ s!(^\s*)?\<(TotalAmount|NoOfRecords|TotalBatch|CurrentBatch|E +ODTransactionDate|BankFileSeqNo|TotAmount)\>(.+?)\<\/\1\>\n?!!mig; +# Zap headers $data =~ s!(^\s*)?</?.*?>\n?!!mig; # Zap xml tags CRC32($data); # CRC remaining data
      Should be fast, but in general regex'ing x/html is fragile and NOT recommended.
      (implicit assumptions about the data content is one of the traps, -- which may/not apply in your case).
      You probably should use XML::Twig instead!