in reply to Unexpected File Results

In the field of mathematics there is a very useful tool called "Proof By Contradiction". You assume the converse of what you want to prove and then demonstrate that this leads to a logical contradiction. If your chain of logic is impeccably correct, then your initial assumption must have been wrong; thus demonstrating that its converse (the statement you wanted to prove in the first place) is true. Often this is an easier approach than a flat-out straight-on proof of the correctness of your original theorem.

There are a couple of possibilities that I see here:

  1. The assumption about a constant physical block-size of the data is in error.
  2. The 'constant block-size' refers to the number of records composing an entry ('every patient will have the following sixteen pieces of information'), but the lengths of the individual records can vary ('fields that do not have data will be entered as a single blank or zero').
I'd bet that further discussion with the User/Designer of the input will be most instructive. ("Why, yes, we said that the block-size was constant. It isn't? Hum, you have uncovered a bug. Don't do anything more with the data until we can check this out.")

----
I Go Back to Sleep, Now.

OGB

Replies are listed 'Best First'.
Re^2: Unexpected File Results
by Grundle (Scribe) on May 30, 2007 at 15:41 UTC
    I appreciate your mathematical approach, but let me set your mind at ease. For assumption 1, I have proven it out by using the handy tool called "dd" plus another fantastic mathematical convention. By taking the total file length and dividing it by the total number of records contained within I was able to find the blocksize for each record.

    blocksize = filesize/total_num_records

    To further prove this, I know that each block starts with a customer name. So with the handy tool 'dd' I can move to an arbitrary record. If my blocksize is off, then I will not have the name starting the block.

    dd if=filename bs=550 count=1 skip=2000 | od -Ad -c

    The previous statement moves me to block number 2000 and allows me to see one instance of that block (in hex format). The name is at the correct location so we have proven that 1 is not the case.

    For statement number 2 let us refer to the file itself. Since it is in binary format, and since it is a database type file it is logically broken up into these blocks. These blocks in the database world are also called "rows". Although they can have empty locations, the system has allocated these spaces before hand. These previously allocated spaces will still exist in the binary file, even though it is filled with nulls or (in the hexdump) \0.

    One more thing I would like to point out, is that I find it quite useful to approach the problem from a different angle. Most developers would never think to come from the standpoint you have suggested. Thank you for those thoughts