Heidegger has asked for the wisdom of the Perl Monks concerning the following question:

Holy Monks,

At the place where I work we have a bunch of XML documents and now the time has come to put them into a repository that acts like a searchable directory with a web interface. I want to put a quick demo in Perl before the managers ask me to do it Java ;-)

I know that a native XML database is still a rarety. It's more popular to have XML documents with a relational database indexing. Can someone suggest Perl modules for this kind of application?

Thank you very much.

Replies are listed 'Best First'.
Re: Native XML Repository + Perl
by mirod (Canon) on Mar 27, 2003 at 08:29 UTC

    I don't think you will find anything that will work out-of-the-box, but here are a couple of leads:

    • DBD::AnyData will let you treat each XML file as a table. You will have to specify how to map the XML to a table structure and it works one document at a time, but it is a start.

    • as far as XML DBs you can have a look at DB XML, which is based on BerkeleyDB (and developed by Sleepycat). It stores XML documents and indexes pre-computed XPath queries. It might be what you are looking for in terms of the underlying repository. Look at John Merrells' Blog for up-to-date information on it. It ships with a Perl interface, written by Paul Marquess.
Re: Native XML Repository + Perl
by robartes (Priest) on Mar 27, 2003 at 07:58 UTC
    For relational database access, there can be only one: DBI with the DBD module for your database. For handling the XML, depending on what you want to do with it, try XML::Simple for simple processing of well-formed documents, and XML::Parser for a full blown XML parser.

    For searching the database, you can have a look at DBIx::FullTextSearch to index your XML.

    These are just a few suggestions, other monks will undoubtedly have others for you to look at.

    CU
    Robartes-

Re: Native XML Repository + Perl
by grantm (Parson) on Mar 27, 2003 at 09:56 UTC

    If you're happy to leave the XML documents in files, but want a system to help you locate specific documents, you might want to look at the SWISH-E indexing package. It comes with libxml integration so that you have full control over the indexing. For example you might choose to have the indexing process extract the contents of a particular tag and associate it with a named property in the index. Then, you can define a search to locate all documents with a specified value in that property. You can query the index from the command line and there is also a Perl module API.

Re: Native XML Repository + Perl
by toma (Vicar) on Mar 27, 2003 at 16:36 UTC
    I just did this sort of thing. I used File::Slurp to read the directories of XML files and XML::Twig to parse the XML files and create delimited ascii tables. The delimited ascii tables are read into the database using a very short SQL program.

    I made a mod_perl web application to query the database using CGI::Application, HTML::Template, and DBI. The resulting code is simple, brief, and separates program logic from the HTML. The big win with this separation is that the perl code becomes so much shorter that it is much easier to read. The HTML is also easier to read, since it isn't spread across the perl source code.

    The way that I used XML::Twig is similar to the way that XML::Filter::Dispatcher works. Since XML::Filter uses SAX, and SAX is a standards-based approach, the javaheads may like it better. I compared the two approaches and found them to have similar speed in a different application. XML::Twig, however, has the advantage that it easy to tune the code to provide a speed/memory tradeoff.

    It should work perfectly the first time! - toma

      You can also read my comments on the comparizon. The highlight is that I managed to get the code to run twice as fast (Barry could probably do the same with XML::Filter::Dispatcher too though ;--)

Re: Native XML Repository + Perl
by zby (Vicar) on Mar 27, 2003 at 10:47 UTC
Re: Native XML Repository + Perl
by Anonymous Monk on Mar 27, 2003 at 18:39 UTC
    You could try eXist.
    http://www.exist-db.org/

    Its java, and comes with it's own web server and web interface, but you can operate the XML::DB portion from a CPANned module. Installation takes no time.

    http://ftp.rucus.ru.ac.za/pub/perl/CPAN/authors/id/G/GS/GSEAMAN/XML-DB.readme

    then you can perform fielded searches via XPath.