in reply to Improving script that uses xmllint to validate

I got the problem why the script was running very slow.

Actually, there is no problem with the XML validation part, problem was with the XML generation module which depends on a hash typically containing 1M entries in it.

Each and every record needs to be validated before pushing it to an external module, something like
for record ( list of records ) do create xml file with 1 record of data validate the above xml file if validated then push record to success file else push record to failure file fi done use success file to generate final xml file
So, for each and every xml file generation with 1 record of data ( for validation ) will create a hash with 1M entries, that is the reason the script is performing so bad.

Now the question is - is there any way to 'pin' the created hash structure to memory so that any process making use of the data can refer to it using some 'memory namespace'.

In case of XSD parsing, such a file caching operation is possible using File::Cache, is there anything similar to that available?

If I could pin hash data structure to memory without re-creating it each and every time, there would be a mega improvement in the performance of my script.

Thanks in advance ! :)

Replies are listed 'Best First'.
Re^2: Improving script that uses xmllint to validate
by Anonymous Monk on Dec 27, 2008 at 18:25 UTC
    Now the question is - is there any way to 'pin' the created hash structure to memory so that any process making use of the data can refer to it using some 'memory namespace'.
    Whats that mean? Maybe I'm ignorant, but isn't that how every program works?
      Hash gets created in the memory and will stay as long as the instance ( process ) that built is resident in the memory.

      What am looking for is -
      Process 'A' is spawned Build hash data structure with 1 M entries Assign a namespace ( example name : h-m1 ) to the above Now process 'A' dies off, but h-m1 should still be resident in the mem +ory that is they have to be resident even after the process which cre +ated terminates.
      Now when another instance of process 'A' is spawned, without having to reconstruct the hash data structure with 1M entries, it should simply do a lookup using the namespace 'h-m1' ( or something like that )

      Am trying to achieve some optimization on the time taken to construct the hash of 1M entries which is a big win for my script

      Is that possible? Many thanks in advance :)