Woodchuck has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, I have several xml files which I want to concatenate into a single file to use as a database with DBD::Anydata;
. . . <NbSitesInChainN>4</NbSitesInChainN> <NbSlicesBetweenStabilization>4</NbSlicesBetweenStabilization> <NbWarmUpCycles>1000000</NbWarmUpCycles> <PotentialEnergy> <Error1>0.26640880E-03</Error1> <Tau1>1.12518850E+00</Tau1> <Value>-0.25547426</Value> </PotentialEnergy> <PotentialSpecificHeat> <Bias>-4.43921585E-05</Bias> <Error1>0.53285588E-02</Error1> <Error2>0.57902443E-02</Error2> <Tau1>5.00000000E-01</Tau1> <Tau2>5.69579788E-01</Tau2> <Value>0.32688055</Value> </PotentialSpecificHeat> . . .
The thing is the format needed is really rigid (read AnyData::Format::XML) and there's talk of DTDs but I know nothing about them I'm free to change my XML OR use some other tools than AnyData. It's really the database features I'm interested in. Any tips? Thanks in advance.

Replies are listed 'Best First'.
Re: XML file as database
by jZed (Prior) on Sep 07, 2006 at 18:46 UTC
    If you want to access data as a relational database, then the data needs to be arranged into records which contain a set number of fields each of which contains a single value. That is the basis of the "rigidity" of DBD::AnyData's XML format. The sample that you show does not appear to be divided into records (unless you are omitting some enclosing tag or unless you have one record per file) and some of the fields you show (e.g. PotentialSpecificHeat) contain multiple values. You need to conceptually reduce your data to fields, records, and values before you can treat it like a database regardless of whether you use AnyData or something else. If you can tell us how you'd do that reduction perhaps we can make some suggestions.
      Yes it's the nesting that's the problem. There is one such xml file by record (If my understanding of what a record is is correct) I just wanted to point out that there is nesting involved. What about col_map ? Can I assign a unique tag to the nested elements in there? -Chuck
        Again, before I can answer that, I need to know how you want to conceptually handle the nesting. The "correct" way to put nested elements into a relational database would be to fold out additional lookup tables but that would get to be pretty complicated. The simplest (and "incorrect" i.e. non-relational) way is to serialize the data e.g. to join all of the nested fields into a single field with e.g. a semicolon separator. My guess is that your best option will be instead to "promote" the nested fields i.e. to go from this
        <PotentialSpecificHeat> <Bias>-4.43921585E-05</Bias> <Error1>0.53285588E-02</Error1> <Error2>0.57902443E-02</Error2> <Tau1>5.00000000E-01</Tau1> <Tau2>5.69579788E-01</Tau2> <Value>0.32688055</Value> </PotentialSpecificHeat>
        to this (eliminate PotentialSpecificHeat as an enclosing tag and promote its nested elements to unested tags with a PSH prefix to show their origin)
        <PSH_Bias>-4.43921585E-05</Bias> <PSH_Error1>0.53285588E-02</Error1> <PSH_Error2>0.57902443E-02</Error2> <PSH_Tau1>5.00000000E-01</Tau1> <PSH_Tau2>5.69579788E-01</Tau2> <PSH_Value>0.32688055</Value>
        If that kind of thing would work for you then you can use Perl to combine your files into one file and pre-process the nested tags into tags. You could then use DBD::AnyData on the results.
Re: XML file as database
by gellyfish (Monsignor) on Sep 07, 2006 at 18:41 UTC

    You might want to look at Berkeley DB XML which has a Perl API. In principle it allows you to store arbitrary XML data. The drawback might be that it doesn't support a DBD interface at present, but this might not matter for your application.

    /J\

Re: XML file as database
by rsriram (Hermit) on Sep 08, 2006 at 06:28 UTC

    DBD::AnyData will be right module if you want to use a XML file instead of a database. You can also explore the possibilities of using XML::Database. About the DTD: DTD is the acronym for Document type definition and it is a ASCII file which will contain the specification about sequence of appearance of elements in your XML file and the well-formedness of the XML instance.

    For example, if you want to specify that NbSitesInChainN, NbSlicesBetweenStabilization, NbWarmUpCycles and PotentialEnergy has to be in a sequence and PotentialEnergy is the parent element for Error1, Tau1 and Value, they can be specified. There are some tools (called parsers) to compare your XML against the DTD.

    I have previously tried using a XML file as a backend instead of a database but I encountered quite a lot of problems when the volume of data increased. Especially during operations like searching, sorting etc., and coding the scripts also consumed considerable time.