Re: Improving script that uses xmllint to validate

I got the problem why the script was running very slow.

Actually, there is no problem with the XML validation part, problem was with the XML generation module which depends on a hash typically containing 1M entries in it.

Each and every record needs to be validated before pushing it to an external module, something like

for record ( list of records )
do
create xml file with 1 record of data
validate the above xml file
if validated
then
  push record to success file
else
  push record to failure file
fi
done

use success file to generate final xml file
[download]

So, for each and every xml file generation with 1 record of data ( for validation ) will create a hash with 1M entries, that is the reason the script is performing so bad.

Now the question is - is there any way to 'pin' the created hash structure to memory so that any process making use of the data can refer to it using some 'memory namespace'.

In case of XSD parsing, such a file caching operation is possible using File::Cache, is there anything similar to that available?

If I could pin hash data structure to memory without re-creating it each and every time, there would be a mega improvement in the performance of my script.

Thanks in advance ! :)

Comment on Re: Improving script that uses xmllint to validate Download Code

Replies are listed 'Best First'.
Re^2: Improving script that uses xmllint to validate by Anonymous Monk on Dec 27, 2008 at 18:25 UTC
Now the question is - is there any way to 'pin' the created hash structure to memory so that any process making use of the data can refer to it using some 'memory namespace'. Whats that mean? Maybe I'm ignorant, but isn't that how every program works?	[reply]
Re^3: Improving script that uses xmllint to validate by matrixmadhan (Beadle) on Dec 28, 2008 at 12:14 UTC
Hash gets created in the memory and will stay as long as the instance ( process ) that built is resident in the memory. What am looking for is - `Process 'A' is spawned Build hash data structure with 1 M entries Assign a namespace ( example name : h-m1 ) to the above Now process 'A' dies off, but h-m1 should still be resident in the mem +ory that is they have to be resident even after the process which cre +ated terminates.` [download] Now when another instance of process 'A' is spawned, without having to reconstruct the hash data structure with 1M entries, it should simply do a lookup using the namespace 'h-m1' ( or something like that ) Am trying to achieve some optimization on the time taken to construct the hash of 1M entries which is a big win for my script Is that possible? Many thanks in advance :)	[reply] [d/l]