in reply to Caching Entities with XML::LibXML

I think the answer is "sort of." You are doing a validate() or is_valid() call, right? It is a minor pita but if that's right you can use a system path to a local copy of the same DTD. I have done this but it's been a while and I can't reach my old code tree right now. See XML::LibXML::Dtd for a bit more. HTML::DTD might make it a little less painful, maybe.

Replies are listed 'Best First'.
Re^2: Caching Entities with XML::LibXML
by ikegami (Patriarch) on Feb 24, 2010 at 22:36 UTC

    I think the answer is "sort of." You are doing a validate() or is_valid() call, right?

    No, just a simple parse. Specifying validation => 0 doesn't stop the behaviour. The parser needs the DTD to know that   is character U+00A0. I don't see how to tell the parser to use a preconstructed XML::LibXML::Dtd object.

    On the other hand, HTML::DTD does provide a handy source for the DTDs for one's ext_ent_handler. ( ... or not. It doesn't provided xhtml-lat1.ent which is required by xhtml1-strict.dtd )

    I just noticed something called "XML catalogs" in the Parser documentation. It sounds like a simple solution, and it sounds like it allows reuse of the compiled DTDs.

      Well, this is very interesting. Please update the OP or thread with your final solution, as it were.

        • XML::LibXML doesn't cache the parsed DTDs it finds via catalogs or any other means (as far as I can tell). Nothing can be done about this.
        • Catalogs can be used to tell where to find a number of DTDs.
        • DTDs named in catalogs can be stored locally.
        • The location of DTDs can be relative to the catalog, or absolute urls.
        • XML::LibXML parses the catalog once per process.
        • XML::LibXML loads the DTDs it finds in catalogs on demand.
        • The last two points mean a catalog can contain DTDs that are rarely used, if ever.

        I'm going to create XML::Catalogs (common code) and XML::Catalogs::HTML (installs and loads catalog of HTML DTDs). All you'll need to do to prevent the download of HTML DTDs will be:

        use XML::Catalogs::HTML -libxml;

        For users unable to alter their system configuration,
        for users unaware of the need to alter their system configuration,
        for the simplicity of installing a Perl package,
        for integration with Perl's dependency system,

        XML::Catalogs and XML::Catalogs::HTML are now on CPAN.

        (I got pod errors, I misspelled "dependency", and I could improve the description of the purpose of the module. Let me know if you have comments or if you want more features.)