rorokimdim has asked for the wisdom of the Perl Monks concerning the following question:

Hi guys, I have a binary file with "records" of variable lengths. Each record ends with a '\n'. I am trying to find a nice solution for updating records. To update a record, should I split the file at the record position, write my record, and merge the split files? Could you please suggest something? I cannot make the records of fixed size, otherwise updating would have been really easy. thanks a lot, ro
  • Comment on Updating Binary Files With variable record lengths

Replies are listed 'Best First'.
Re: Updating Binary Files With variable record lengths
by BrowserUk (Patriarch) on Aug 30, 2007 at 10:38 UTC
    I have a binary file with "records" of variable lengths. Each record ends with a '\n'.

    This makes no sense. Let's say that your records consist of a number of unsigned longs, it doesn't matter how many. When these are packed to their 4-byte binary representations, there are ~67,000,000 legitimate values where one or more bytes of the four will be "\n". These include the numbers 10, 266, 522, 778, 1034, 1290, 1546, 1802, 2058, 2314, 2560, 2561, 2562, 2563, 2564, 2565, 2566, 2567, 2568 ...

    When you try to read a record containing one of these values back expecting a "\n" as the delimiter, the IO code will encounter the "\n" embedded within one of your binary values and truncate the record.

    And you will have the same problem with all types of packed binary numbers. Shorts & longs, signed & unsigned, and floating point also.

    So, if you are writing raw binary values to a file and trying to use "\n" as a delimiter--or any other character--you will not be able to read that data back successfully. Period.

    If you are not writing raw binary values to the file, then you are goingt o have to clarify what you mean by "binary data", because your question as asked simply doesn't make any sense.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Updating Binary Files With variable record lengths
by GrandFather (Saint) on Aug 30, 2007 at 05:24 UTC

    How many edits per file? If it's a one off then as you suggest is probably optimum. If there are multiple edits per file, but otherwise a one off then order the edits by file position then run through the source file and output to an edited version applying edits as you go. If this is an ongoing issue use a database (DBI and DBD::SQLite is pretty much a no-brainer to get going).


    DWIM is Perl's answer to Gödel
Re: Updating Binary Files With variable record lengths
by runrig (Abbot) on Aug 30, 2007 at 05:02 UTC
    Tie::File maybe? I can't say for sure since I still don't know all the details.
Re: Updating Binary Files With variable record lengths
by apl (Monsignor) on Aug 30, 2007 at 12:00 UTC
    Slightly off-topic: For variable-length binary files, you should consider having each record start with something fairly unique (like an SOH) followed by the length of the record.

    This doesn't answer your question, but it will make it easier to determine that you're procesing a valid record. (One organisms record break is anothers data.)