in reply to Memory problems parsing XML
open infile,$filepath; my $Block; while (<infile>) { if (/\<ppsarticle\>) { &Parse_XML($Block); undef $Block; } $Block .= $_; } &Parse_XML($Block); # Don't forget last block
use GDBM_File; # my favorite, but there are others use Fcntl; tie %md5,'GDBM_File','/tmp/md5.tmp',O_RDWR,0600; # You may need to add + |O_CREAT after O_RDWR
open infile,$filepath; open outfile,'>'.$outfilepath; my $XML; while (<infile>) { $XML .= $_; if (s/\<\ppsarticle\>(.*?)\<\/ppsarticle\>//) { my $Article = $1; my $MD5SUM = md5($Article); $md5{$MD5SUM} and next; $md5{$MD5SUM} = 1; print outfile $Article; } }
A final note on $md5{$md5}: Perl could manage this, but chances that you or another developer sooner or later will mix this up are nearly at 100%. Try to advoid using the same name for strings, arrays and hashs. You'll save yourself a lot of trouble and hours of searching for typos (even worse if both $X{1} and $X->{1} are valid).
(You still need to work at bit on the code samples, closing files, for example and actually test them against your data.)
|
|---|