in reply to Re: Caching Entities with XML::LibXML
in thread Caching Entities with XML::LibXML

I think the answer is "sort of." You are doing a validate() or is_valid() call, right?

No, just a simple parse. Specifying validation => 0 doesn't stop the behaviour. The parser needs the DTD to know that   is character U+00A0. I don't see how to tell the parser to use a preconstructed XML::LibXML::Dtd object.

On the other hand, HTML::DTD does provide a handy source for the DTDs for one's ext_ent_handler. ( ... or not. It doesn't provided xhtml-lat1.ent which is required by xhtml1-strict.dtd )

I just noticed something called "XML catalogs" in the Parser documentation. It sounds like a simple solution, and it sounds like it allows reuse of the compiled DTDs.

Replies are listed 'Best First'.
Re^3: Caching Entities with XML::LibXML
by Your Mother (Archbishop) on Feb 24, 2010 at 22:46 UTC

    Well, this is very interesting. Please update the OP or thread with your final solution, as it were.

      • XML::LibXML doesn't cache the parsed DTDs it finds via catalogs or any other means (as far as I can tell). Nothing can be done about this.
      • Catalogs can be used to tell where to find a number of DTDs.
      • DTDs named in catalogs can be stored locally.
      • The location of DTDs can be relative to the catalog, or absolute urls.
      • XML::LibXML parses the catalog once per process.
      • XML::LibXML loads the DTDs it finds in catalogs on demand.
      • The last two points mean a catalog can contain DTDs that are rarely used, if ever.

      I'm going to create XML::Catalogs (common code) and XML::Catalogs::HTML (installs and loads catalog of HTML DTDs). All you'll need to do to prevent the download of HTML DTDs will be:

      use XML::Catalogs::HTML -libxml;

      For users unable to alter their system configuration,
      for users unaware of the need to alter their system configuration,
      for the simplicity of installing a Perl package,
      for integration with Perl's dependency system,

      XML::Catalogs and XML::Catalogs::HTML are now on CPAN.

      (I got pod errors, I misspelled "dependency", and I could improve the description of the purpose of the module. Let me know if you have comments or if you want more features.)