deibyz has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks,

I have a problem with a data integrity check on a potentially huge data structure.

Let me try to explain,
I have an AoH like that:

%AoH =( { Identifier1 => Anumber, IdNumber1 => 1100001, Identifier2 => Bnumber, IdNumber2 => 1000000, Identifier3 => Cnumber, IdNumber3 => 2222222 } [... More ...] );

Each element can have 1, 2 or 3 different identifiers, out of about a total of 10 (Jnumber).
I have to make a consistency check to see if every distinct identifier set (that can be incomplete) is coherent with the rest.

For example, this is wrong: (Abrev. sintax)

Anumber = 1, BNumber = 2, Cnumber = 3 Anumber = 1, BNumber = 4, Cnumber = 3 #Bnumber is different

This is wrong too:

Anumber = 1, BNumber = 2, Cnumber = 3 Cnumber = 3, DNumber = 4, Enumber = 5 Enumber = 5, Anumber = 2, <undef> # harder to found # Anumber changes

I've tried some approachs, different hashes for each identifier type referencing the rest of the data, but all of them end being a huge unreadable code, taking a lot of memory and resources (I can have more than 100k elements in this array, and keeps growing)

Is there any "easy" way to do it that I'm missing?

Thanks in advance,
deibyz

Replies are listed 'Best First'.
Re: Complex data structure check
by stvn (Monsignor) on Aug 13, 2004 at 18:04 UTC

    You might want to give Test::Deep a look, it has saved me many hours of tedious coding to check/verify large data structures. The documentation can seem a little daunting at first, but give it some time, and you will find it can be very useful.

    -stvn
Re: Complex data structure check
by waswas-fng (Curate) on Aug 13, 2004 at 18:49 UTC
    If the errors flow down hill (ie the first entry of Anumber and Bnumber are considered right and any other entries later on in the load process are wrong) that you can simply create a standalone hash that has the reference entries. If <X>number is un-populated populate with the 1st instance otherwise if the new entry does not match the populated hash then the record set for that array errors out. Make sense? let me know if I am misunderstanding your problem,


    -Waswas
Re: Complex data structure check
by graff (Chancellor) on Aug 14, 2004 at 03:17 UTC
    Where is this huge data structure coming from? (Are you reading it from a file, is it being generated by some continuous process?) Are there other reasons, besides the integrity check you're doing here, for having to keep it all in memory at one time?

    It sounds like the integrity check does not require all the data to be memory resident at once. If the data come from any sort of stream, then instead of an array of hashes, you could just do a "while(<>)" loop over each hash, handling just one at a time. (Unless I misunderstand the nature of the task, which is possible.)