bcnagle has asked for the wisdom of the Perl Monks concerning the following question:
Hello oh great monks. I am in serious need of assistance. I have been researching for 2 weeks now and have narrowed my search to XML::Twig but still not understanding this for some reason. I have been coding in C# and PHP for the rest of my project however for this one instance, I must use a perl script on a linux box to parse a very large (almost 1gig) xml document to get all attributes to load into a database once per month. The xml file contains this bit of information:
<interface> <index Generated="20110714102016"> <file path="required/path/to/other.doc" Product_ID="111" Updated= +"20110713144032" Quality="qualitylevel" Supplier_id="x" Prod_ID="varc +har id name" Catid="0011" On_Market="0" Model_Name="varchar model nam +e" Product_View="11223" HighPic="photo/location.jpg" HighPicSize="112 +2" HighPicWidth="123" HighPicHeight="123" Date_Added="20050910000000" +/> <EAN_UPCS> <EAN_UPC Value="0088698005668"/> <EAN_UPC Value="0886980056684"/> </EAN_UPCS> <Country_Markets> <Country_Market Value="NL"/> <Country_Market Value="BE"/> </Country_Markets>
##repeat file with different attribute strings a million times.
Here is the thing. I only need the attributes parsed, single <file> at a time then flushed, and only in the <file> section skipping the rest.
Can somebody help me with this or at minimum have any hashed guideline for this particular request, or link to code where somebody has done something similar and explain it to me?
Thank you very much for all of your help.
Thank you for the tutorial, I think I'm almost there now and understand the majority of it, one thing is I'm getting an error code and believe something isn't installed. A little more research and I'll post another update. Thanks again.
# !/usr/local/bin/perl -w BEGIN { my $base_module_dir = (-d '/home/bcnagle/perl' ? '/home/bcnagle/pe +rl' : ( getpwuid($>) )[7] . '/perl/'); unshift @INC, map { $base_module_dir . $_ } @INC; } use strict; use XML::Twig; use DBI; my $path; my $Product_ID; my $Updated; my $Quality; my $Supplier_id; my $Prod_ID; my $Catid; my $On_Market; my $Model_Name; my $Product_View; my $HighPic; my $HighPicSize; my $HighPicWidth; my $HighPicHeight; my $Date_Added; my $dbh= connect_to_db(); my $insert= $dbh->prepare( "INSERT INTO files (path, Product_ID, Upda +ted, Quality, Supplier_id, Prod_ID, Catid, On_Market, Model_Name, Pro +duct_View, HighPic, HighPicSize, HighPicWidth, HighPicHeight, Date_Ad +ded) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?);"); my $twig = new XML::Twig( twig_handlers => {indexfile => \$file} ); $twig->parsefile( "/file.xml" ); $twig->flush; # flush the end of t +he twig $dbh->disconnect(); exit; sub connect_to_db { my $driver = "mysql"; my $dsn = "DBI:$driver:database=database;"; my $dbh = DBI->connect($dsn, 'uname', 'pword', {AutoCommit=>1}); my $drh = DBI->install_driver($driver); return( $dbh); } sub indexfile { my($twig, $file) = @_; $path = $file->att( 'path'); $Product_ID = $file->att( 'Product_ID'); $Updated = $file->att( 'Updated'); $Quality = $file->att( 'Quality'); $Supplier_id = $file->att( 'Supplier_id'); $Prod_ID = $file->att( 'Prod_ID'); $Catid = $file->att( 'Catid'); $On_Market = $file->att( 'On_Market'); $Model_Name = $file->att( 'Model_Name'); $Product_View = $file->att( 'Product_View'); $HighPic = $file->att( 'HighPic'); $HighPicSize = $file->att( 'HighPicSize'); $HighPicWidth = $file->att( 'HighPicWidth'); $HighPicHeight = $file->att( 'HighPicHeight'); $Date_Added = $file->att( 'Date_Added'); $insert->bind_param( 1, $path); $insert->bind_param( 2, $Product_ID); $insert->bind_param( 3, $Updated); $insert->bind_param( 4, $Quality); $insert->bind_param( 5, $Supplier_id); $insert->bind_param( 6, $Prod_ID); $insert->bind_param( 7, $Catid); $insert->bind_param( 8, $On_Market); $insert->bind_param( 9, $Model_Name); $insert->bind_param( 10, $Product_View); $insert->bind_param( 11, $HighPic); $insert->bind_param( 12, $HighPicSize); $insert->bind_param( 13, $HighPicWidth); $insert->bind_param( 14, $HighPicHeight); $insert->bind_param( 15, $Date_Added); $insert->execute(); $twig->purge; }
errors out with:
weaken is only available with the XS version of Scalar::Util at /home/bcnagle/perl/usr/lib/perl5/site_perl/5.8.8/XML/Twig.pm line 117 BEGIN failed--compilation aborted at /home/bcnagle/perl/usr/lib/perl5/site_perl/5.8.8/XML/Twig.pm line 172. Compilation failed in require at /home/bcnagle/public_html/xmlparse.pl line 9. BEGIN failed--compilation aborted at /home/bcnagle/public_html/xmlparse.pl line 9.
Starting new thread since the XML::Twig isn't working on my server, I decided to go to normal XML::Parser with SAX. Thank you all for your help, you have been very informative.
|
|---|