Hello oh great monks. I am in serious need of assistance. I have been researching for 2 weeks now and have narrowed my search to XML::Twig but still not understanding this for some reason. I have been coding in C# and PHP for the rest of my project however for this one instance, I must use a perl script on a linux box to parse a very large (almost 1gig) xml document to get all attributes to load into a database once per month. The xml file contains this bit of information:

<interface> <index Generated="20110714102016"> <file path="required/path/to/other.doc" Product_ID="111" Updated= +"20110713144032" Quality="qualitylevel" Supplier_id="x" Prod_ID="varc +har id name" Catid="0011" On_Market="0" Model_Name="varchar model nam +e" Product_View="11223" HighPic="photo/location.jpg" HighPicSize="112 +2" HighPicWidth="123" HighPicHeight="123" Date_Added="20050910000000" +/> <EAN_UPCS> <EAN_UPC Value="0088698005668"/> <EAN_UPC Value="0886980056684"/> </EAN_UPCS> <Country_Markets> <Country_Market Value="NL"/> <Country_Market Value="BE"/> </Country_Markets>

##repeat file with different attribute strings a million times.

Here is the thing. I only need the attributes parsed, single <file> at a time then flushed, and only in the <file> section skipping the rest.

Can somebody help me with this or at minimum have any hashed guideline for this particular request, or link to code where somebody has done something similar and explain it to me?

Thank you very much for all of your help.

Update

Thank you for the tutorial, I think I'm almost there now and understand the majority of it, one thing is I'm getting an error code and believe something isn't installed. A little more research and I'll post another update. Thanks again.

Update II

code is here and error follows. Trying to fix the error right now but so far no luck.
# !/usr/local/bin/perl -w BEGIN { my $base_module_dir = (-d '/home/bcnagle/perl' ? '/home/bcnagle/pe +rl' : ( getpwuid($>) )[7] . '/perl/'); unshift @INC, map { $base_module_dir . $_ } @INC; } use strict; use XML::Twig; use DBI; my $path; my $Product_ID; my $Updated; my $Quality; my $Supplier_id; my $Prod_ID; my $Catid; my $On_Market; my $Model_Name; my $Product_View; my $HighPic; my $HighPicSize; my $HighPicWidth; my $HighPicHeight; my $Date_Added; my $dbh= connect_to_db(); my $insert= $dbh->prepare( "INSERT INTO files (path, Product_ID, Upda +ted, Quality, Supplier_id, Prod_ID, Catid, On_Market, Model_Name, Pro +duct_View, HighPic, HighPicSize, HighPicWidth, HighPicHeight, Date_Ad +ded) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?);"); my $twig = new XML::Twig( twig_handlers => {indexfile => \$file} ); $twig->parsefile( "/file.xml" ); $twig->flush; # flush the end of t +he twig $dbh->disconnect(); exit; sub connect_to_db { my $driver = "mysql"; my $dsn = "DBI:$driver:database=database;"; my $dbh = DBI->connect($dsn, 'uname', 'pword', {AutoCommit=>1}); my $drh = DBI->install_driver($driver); return( $dbh); } sub indexfile { my($twig, $file) = @_; $path = $file->att( 'path'); $Product_ID = $file->att( 'Product_ID'); $Updated = $file->att( 'Updated'); $Quality = $file->att( 'Quality'); $Supplier_id = $file->att( 'Supplier_id'); $Prod_ID = $file->att( 'Prod_ID'); $Catid = $file->att( 'Catid'); $On_Market = $file->att( 'On_Market'); $Model_Name = $file->att( 'Model_Name'); $Product_View = $file->att( 'Product_View'); $HighPic = $file->att( 'HighPic'); $HighPicSize = $file->att( 'HighPicSize'); $HighPicWidth = $file->att( 'HighPicWidth'); $HighPicHeight = $file->att( 'HighPicHeight'); $Date_Added = $file->att( 'Date_Added'); $insert->bind_param( 1, $path); $insert->bind_param( 2, $Product_ID); $insert->bind_param( 3, $Updated); $insert->bind_param( 4, $Quality); $insert->bind_param( 5, $Supplier_id); $insert->bind_param( 6, $Prod_ID); $insert->bind_param( 7, $Catid); $insert->bind_param( 8, $On_Market); $insert->bind_param( 9, $Model_Name); $insert->bind_param( 10, $Product_View); $insert->bind_param( 11, $HighPic); $insert->bind_param( 12, $HighPicSize); $insert->bind_param( 13, $HighPicWidth); $insert->bind_param( 14, $HighPicHeight); $insert->bind_param( 15, $Date_Added); $insert->execute(); $twig->purge; }

errors out with:

weaken is only available with the XS version of Scalar::Util at /home/bcnagle/perl/usr/lib/perl5/site_perl/5.8.8/XML/Twig.pm line 117 BEGIN failed--compilation aborted at /home/bcnagle/perl/usr/lib/perl5/site_perl/5.8.8/XML/Twig.pm line 172. Compilation failed in require at /home/bcnagle/public_html/xmlparse.pl line 9. BEGIN failed--compilation aborted at /home/bcnagle/public_html/xmlparse.pl line 9.

Update III

Starting new thread since the XML::Twig isn't working on my server, I decided to go to normal XML::Parser with SAX. Thank you all for your help, you have been very informative.


In reply to XML::Twig to mysql totally lost by bcnagle

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.