Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

eval not catching XML Parser die

by agoth (Chaplain)
on Jun 24, 2002 at 10:42 UTC ( [id://176733]=perlquestion: print w/replies, xml ) Need Help??

agoth has asked for the wisdom of the Perl Monks concerning the following question:

Hi, The script below combines an XML Parse of a number of files in a directory, extracting the contents and inserting into a mysql database.

Whats happening is when an unrecognised encoding comes in XML::Parser bombs out as expected. The rest of the files are processed fine, the db inserts work, and the script ends with a nice messy segfault, though no core dump.

Am I not handling my die's correctly or is it something else ? I've currently 'handled' it by using LibXML instead!!

XML::Simple ( 1.08 ) Perl (5.6.1) XML::Parser (2.30) DBI (1.14), and DBD::mysql (2.0419).

use strict; use DBI; use XML::Simple; use Data::Dumper; my $dbh = DBI->connect( 'DBI:mysql:emailb:127.0.0.1', 'user', 'pass', { RaiseError => 1, PrintError => 1 } ) +; die "db connect failed $dbh" unless (ref $dbh eq 'DBI::db'); my $xdir = '/var/tmp/xmlfeed/'; opendir( DIRH, $xdir) or die "cant open $xdir $!"; my @files = grep { /xml$/ } readdir DIRH; closedir DIRH; print "Total files : ", scalar @files, "\n"; my $inc = 0; FILES : for (@files) { my $xmlfile = $xdir . $_; $inc++; print "Processing file number : $inc : $xmlfile \n"; my $struct; my $parser = new XML::Simple( forcearray => 1 ); eval { $struct = $parser->XMLin( $xmlfile ); }; if ($@) { print "XML file : $xmlfile invalid ", $@; next; }; undef $@; my $sth = $dbh->prepare(' insert into category_name values ( ?, ? +)' ); for ( @{ $struct->{'Story'} } ) { my $hashref = $_; eval { $sth->execute( @{ $hashref->{'cat_id'} } ); $sth->finish; }; if ($@) { print "category name insert failed : $@ : skipping \n", Du +mper ($hashref ), "\n------------\n"; next; } undef $@; } $struct = undef; } $dbh->disconnect;

Replies are listed 'Best First'.
Re: eval not catching XML Parser die
by Matts (Deacon) on Jun 24, 2002 at 14:07 UTC
    Segfaults are almost always a conflict of some sort or another. XML::Parser is known to be pretty reliable almost everywhere, except for the few known segfaults when running under mod_perl with older versions of apache. Make sure you're running the latest versions of all the modules (it doesn't look like you are), and you should be fine. Otherwise there's usually very little you can do about it in your perl code.
Re: eval not catching XML Parser die
by joealba (Hermit) on Jun 24, 2002 at 14:00 UTC
    Try commenting out your undef lines and see what happens. Your code looks correct, but you may be getting some seg faults from the undefs. It looks like your scoping will handle clean-up rather nicely anyway.

    It's worth a shot. :) BTW, what platform are you running on? And, how many files are you processing? Does it bomb if you run on 1 or 2 files?

    Update: Does it bomb if you comment out your execute statement? Try to see if it's an XML::Parser problem or a DBI problem.
      Even if I comment out all the DBI calls and all the undef lines, which in hindsight I should have done from my original post the script segfaults.

      It doesnt matter if there is 1..135 ( in my case ). If there is one with an invalid encoding anywhere in the list it will always fail after the last one.

      Oh. PS : Redhat 2.4.7-10 on i686

Re: eval not catching XML Parser die
by grantm (Parson) on Jun 24, 2002 at 14:47 UTC

    I really can't see anything wrong with your error handling code either. One possibility is that your database doesn't handle utf8 encoded data (sorry, I'm not a MySQL guy). Can you narrow it down to a smaller test case that fails? Feel free to email me some code and test data to: grantm(at)cpan.org

    You may be interested in some feedback on your code (if not, please ignore the following).

    1. A quick and dirty way to get a list of files is:

    my @files = <$xdir/*xml>;

    (This is a little different from your code in that elements in @files now include the pathname too).

    2. You call $dbh->prepare() in the for loop - it would be more efficient to do that once before the loop.

    3. You call $sth->finish() after execute. That's tidy, but you almost never need to call finish(). It mostly there so that if you 'last' out of a fetch loop, you can call finish to tell DBI it can discard any unfetched rows.

Re: eval not catching XML Parser die
by agoth (Chaplain) on Jun 24, 2002 at 15:18 UTC
    Oh dear, oh dear,

    Sorry, my debugging / problem searching ability seems to have deserted me on this one.

    the base problem can be isolated on a much simpler script

    It segfaults on linux under 5.6.1

    It doesnt segfault on Solaris with perl 5.00503

    use strict; use XML::Simple; for ('./news-arabic-business.xml') { my $xmlfile = $_; my $struct; my $parser = new XML::Simple( forcearray => 1 ); eval { $struct = $parser->XMLin( $xmlfile ); die "wibble " unless ref $struct eq 'HASH'; }; if ($@) { print "$@ \n"; next; }; }
    where all the xml file has to contain is

    < ?xml version="1.0" encoding="Windows-1256"? >

    on Solaris the modules are older versions.

      Here's my reduced test case:

      use strict; use XML::Simple; my $xml = q(<?xml version="1.0" encoding="Windows-1256"?> <stuff>nope</stuff> ); my $ref = eval { XMLin( $xml ) }; print $@ if($@); print "About to exit\n"; #exit(0);

      Notice the 'exit(0)' line is commented out. If you remove the '#', the segfault goes away (Linux, Perl 5.6.1).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://176733]
Approved by rob_au
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2024-04-19 15:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found