in reply to updating big XML files

I think DB (relational or xml) is the right approach - I don't see other way to randomly access a large XML file.

Still, if what you have is a list of changes and you want to apply them to the large XML (I'm not sure I understand well what your validation works like), you may use XML::Twig to parse the file piece by piece, detect the elements you want to change, do the changes, and flush the processed pieces.

Replies are listed 'Best First'.
Re^2: updating big XML files
by Anonymous Monk on Jul 18, 2008 at 21:09 UTC
    AnyData::Format::XML

    This module allows you to create, search, modify and/or convert XML data and files by treating them as databases without having to actually create separate database files. The data can be accessed via a multi-dimensional tiedhash using AnyData.pm or via DBI and SQL commands using DBD::AnyData.pm. See those modules for complete details of usage.

    The module is built on top of Michel Rodriguez's excellent XML::Twig which means that the AnyData interfaces can now include information from DTDs, be smarter about inferring data structure, reduce memory consumption on huge files, and provide access to many powerful features of XML::Twig and XML::Parser on which it is based.

    Importing options allow you to import/access/modify XML of almost any length or complexity. This includes the ability to access different subtrees as separate or joined databases.

      Thanks! I had heard of XML::Twig but was unfamiliar with the AnyData module. I will definitely give it a try.
Re^2: updating big XML files
by dHarry (Abbot) on Jul 21, 2008 at 13:51 UTC

    Thanks for the comments! A RDBMS it will be. I will definitely give XML::Twig a try. It has been on my "to do" list for some time.

    Wrt the validation it ranges from simple (i.e. number or date in a certain range) to complex (Right- Ascension/Declination of a satellite orbiting some celestial body at a specific time is “correct”). It cannot be captured by XML Schema or any other schema language. We have lots of IDL code and third party libraries to validate the data.