Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

Walking thru XML

by Anonymous Monk
on Nov 20, 2003 at 22:02 UTC ( #308725=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Is there a way to walk through an XML file? I want to parse an XML file line-by-line, taking action depending on the tag found. I'm making a simple backup script that will back up filesystems as well as {My | Postgre}SQL databases. The config file would look like this:
<config> <logprefix>/var/log/sbackup</logprefix> <storage>/home/backups</storage> <item name = "SomeDatabase" type="MySQLDB"> <dbName>SomeDatabase</dbName> <dbUser>SomeUser</dbuser> <dbPass>password</dbPass> </item> <item name = "SomeFilesFiles" type="Filesystem"> <path>/home/globalherald</path> </item> </config>
I tried using XML::Simple, dumping the contents into a hash structure, and walking through that, but I've done something wrong. Is what I'm trying to do possible? Here's the script:
#!/usr/bin/perl # Backing up the server. Looks for a file # called sbackup.xml in /etc which specifies, which databases # and which directories to back up. # # TODO: # Add tar/database dump logs to logs # Add postgresql dumper # Add tar/database dump options to XML use strict; use warnings; use XML::Simple; use IO::File; use vars qw($XMLConfig $logprefix $fh); sub LogMsg { my $consecho = 1; my $deadly = shift; my $message = shift; my $timestamp = localtime(time); open LogFile, ">>$logprefix/sbackup.log" || die "Can't open logfil +e!"; print LogFile "$timestamp: $message\n"; if ($consecho == 1) { print "$timestamp: $message\n"; } return; } sub ParseConfig { $fh = new IO::File('/etc/sbackup.xml') or LogMsg(1, "Can't open sb +ackup.xml!"); $XMLConfig = XMLin($fh); return; } sub BackupMysqlDatabase { my $item = shift; my $dbname = shift; my $dbuser = shift; my $dbpass = shift; my $command = "mysqldump --quick --add-locks --add-drop-table -a - +e -F -K -u $dbuser -p $dbpass $dbname"; #my $result = system($command); print $command; my $result = 0; if ($result == 0) { LogMsg(0, "Backup of $item, database $dbname successful!"); } else { LogMsg(0, "Backup of $item, database $item failed!"); } return; } sub BackupFS { my $item = shift; my $frompath = shift; my $storage = shift; my $command = "tar cf $storage/$item.tar $frompath"; #my $result = system($command); print $command; my $result = 0; if ($result == 0) { LogMsg(0, "Backup of $item, pathname $frompath successful!"); } else { LogMsg(0, "Backup of $item, pathname $frompath failed!"); } return; } my $localstorage; my %currentFS = ("item", "", "source", "") ; my %currentDB = ("item", "", "dbname", "", "dbuser", "", "dbpass", "" +); my $currentitem; ParseConfig(); foreach my $element ($XMLConfig) { print "The element is $element!\n"; my %somehash = {$element}; my @hashkeys = keys %somehash; print "Keys are: @hashkeys\n"; if ($element eq "Logprefix") { $logprefix = $XMLConfig->logprefix; } if ($element eq "Storage") { $localstorage = $XMLConfig->storage; } if ($element eq "Item") { if ($element->{type} eq "MySQLDB") { foreach my $element2 ($element) { if ($element2 eq "DbName") { $currentDB{item} = $element->{item}; $currentDB{dbname} = $element2->{DbName}; } if ($element2 eq "DbUser") { $currentDB{dbuser} = $element2->{DbUser}; } if ($element2 eq "DbPass") { $currentDB{dbpass} = $element2->{DbPass}; } } BackupMySQLDatabase($currentDB{item}, $currentDB{dbname}, +$currentDB{dbuser}, $currentDB{dbpass}); } if ($element->{type} eq "Filesystem") { foreach my $element2 ($element) { if ($element2 eq "Path") { $currentFS{item} = $element->{Item}; $currentFS{path} = $element2->{Path}; } } BackupFS($currentFS{item}, $currentFS{path}, $localstorage +); } } } close ($fh);
Thanks everybody!

Replies are listed 'Best First'.
Re: Walking thru XML
by arturo (Vicar) on Nov 20, 2003 at 23:48 UTC

    The two classic XML-handling strategies are "Tree-based", such as XML::Simple and, in a more heavyweight and full-featured fashion, XML::DOM, and "Stream" or "event-based", such as SAX, which is sort of defined for Java primarily, although it's not surprising that XML::SAX exists for Perl. The tree-based strategy loads a whole XML document into memory, which allows for some neat tricks. The stream-based strategy deals with elements as they are encountered -- SAX turns various parts of an XML file into events (e.g. "here's a start element", "here are some characters", and so forth). Your question makes it sound as if what you want is a stream-based API, and you say you want to process the file "line-by-line," but your example suggests otherwise.

    Your goal seems to be to take the individual &lt;config&gt; elements and turn them into hashes or objects. That's not a "line-by-line" strategy, that's "little trees" or, as one might call them, twigs ... (blatant plug for XML::Twig here). Your example suggests a half-way strategy: you want to grab each config element and its subelements and deal with that chunk, processing them one at a time. You could load up everything into one master tree, then "walk" through the tree selecting each config element in turn. If you have a lot of things to process,though, that could get expensive memory-wise. If it's not a problem then feel free to stick with XML::Simple.

    Now, with respect to your actual goal here, XML::Simple can do a perfectly fine job, although I find it a little bit hard to use (probably because I haven't fully internalized how it turns elements and their attributes into data structures -- forgive me, grantm -- I know this behavior is configurable =). With a little study and care, you could certainly make better use of it than what follows as an example.

    I do know enough to point out that you're using it incorrectly, though. $XMLConfig is a reference to a complex data structure, which (assuming you have some element wrapping a bunch of config elements similar to the one you have posted above), will be a reference to a hash that has a key called config, whose value is a reference to an array of other things, which are in turn quite complex themselves ... each of those "other things' ( the elements of the array reference) correponds to a config element and its contents in your file. So the basic outer processing loop would look like this:

    foreach my $config ( @{ $XMLConfig->{config} } ) { my $logprefix = $config->{logprefix}; #etc ... }
    Finishing that up is left as an exercise for the reader =) If you want to get a better handle on what the data structure looks like at any point, use Data::Dumper to print out the structure for you.

    As an aside, I know your code is skeletal, but you can't capture the output of system commands; you could use backticks or qx//, but let me suggest that you pipe mysqldump's output to a file and then deal appropriately with the file).

    Finally, let me give you a start on how you might use XML::Twig for this job. The basic framework might look like this:

    #!/usr/bin/perl use strict; use XML::Twig; # create a new Twig object that will call the "config" # subroutine once it's seen a complete "config" element my $twig = XML::Twig->new( twig_handlers => { 'config' => \&config }); $twig->parsefile("configs.xml"); sub config { my ($t, $config ) = @_; # $config is a config element my $logprefix = $config->child("logprefix")->text; my @items = $config->children("item"); foreach my $item ( @items ) { my $name = $item->att('name'); my $type = $item->att('type'); # and so forth } }

    YMMV, of course, but I find the twiggish way of doing it easier to understand. HTH!

    If not P, what? Q maybe?
    "Sidney Morgenbesser"

Re: Walking thru XML
by princepawn (Parson) on Nov 20, 2003 at 22:21 UTC

    I used XML::TreeBuilder's traverse method to do that.

    Actually to be honest, I used HTML::TreeBuilder, but the API for XML and HTML treebuilder are very much the same. And I would certain reach for XML::TreeBuilder if I had an XML task to do. PApp::SQL and CGI::Application rock the house

Re: Walking thru XML
by mirod (Canon) on Nov 21, 2003 at 11:37 UTC

    A few comments:

    • XML does not have the concept of lines, it has the concept of element, so try not to use a line metaphor when you think of XML data, a tree works better.
    • then I will have to concur with arturo (and not only with his usage of XML::Twig ;--) : if you want to use XML::Simple, load the config data and dump it using Data::Dumper or YAML, it will make it much easier for you to figure out what's going on (alternatively you can use the debugger by running perl -d and use x $XMLConfig),
    • finally, is there any specific reason why you want to use XML here? I find YAML to be easier to read and to edit than XML. After fixing the typo in your XML, perl -MYAML -MXML::Simple -e 'print Dump XMLin "config.xml"' will give you this:
      --- #YAML:1.0 item: SomeDatabase: dbName: SomeDatabase dbPass: password dbuser: SomeUser type: MySQLDB SomeFilesFiles: path: '/home/globalherald' type: Filesystem logprefix: '/var/log/sbackup' storage: '/home/backups'
Re: Walking thru XML
by Jaap (Curate) on Nov 20, 2003 at 22:22 UTC

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://308725]
Approved by bart
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2022-12-08 20:07 GMT
Find Nodes?
    Voting Booth?

    No recent polls found