in reply to XML RDBMS import

jamesd256:

Perhaps your problem is that you're running populate_tables in both runs, giving you some duplicate records. Maybe it would work if you remove the populate_tables from the first pass? On re-reading the node, it appears that I'm mistaken and that you've already separated out the distinct records. However, does your XML split script perhaps include all "dependent" records (as in a foreign-key relationship)? If so, perhaps it's one of the foreign key tables that includes duplicates.

Alternatively, perhaps you can examine your schema that XML::RDB generated, and use XPath queries to split out the XML into individual tables, and then process the smaller files individually on a per-table basis?

...roboticus

Update: added CPAN link, added italicised section after re-reading OP.

Replies are listed 'Best First'.
Re^2: XML RDBMS import
by Anonymous Monk on May 19, 2010 at 13:34 UTC

    Yes, I think you are on the right track.

    I think the issue with splitting the XML is that XML::RDB tries to extract shared values into a one to many relationship, to avoid redundancy. This is great, but it defeats the possibility of splitting.

    If I'm right, then during the second data import, it will try to create the shared values as foreign key entries, but without checking to see if they already exist.

    I was hoping someone would suggest a way to get it to check for pre-existing fk entries, or some such.

    Looks like I might have to go down a more manual route, which is a shame as XML::RDB looks pretty elegant otherwise.