in reply to Deciding for a file access method - requesting opinions

It's really hard to imagine the type of processing you are doing on this file.

The phrases "binary to parsed" & "parsed to binary" aren't ones I've ever encountered before.

Do they indicate that you reading information from the file; manipulating it and then re-writing it in place with new values?

And this "modifying the file during a parse / write is guaranteed to lead to inconsistency" indicates that there may be multiple concurrent accessors?


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!
  • Comment on Re: Deciding for a file access method - requesting opinions

Replies are listed 'Best First'.
Re^2: Deciding for a file access method - requesting opinions
by Monk::Thomas (Friar) on Jul 14, 2015 at 21:08 UTC

    Do they indicate that you reading information from the file; manipulating it and then re-writing it in place with new values?

    The user is supposed to convert the binary data into a JSON-representation (either as a file or in memory), modify it as required and then convert JSON back into the binary format.

    And this "modifying the file during a parse / write is guaranteed to lead to inconsistency" indicates that there may be multiple concurrent accessors?

    I'm reading a file. It's always possible someone is doing something stupif(1) and modifying / replacing the file while the tool is parsing it. A big fat "DON'T DO THAT" sticker is probably sufficient, but it would be nice to limit the exposure. Another way could be to calculate the file size / date / md5sum before and after parsing and repeat the parse if it changed unexpectedly.

    (1) Nice typo.

    P.S.: I already talked about the data format in a previous node

      The user is supposed to convert the binary data into a JSON-representation (either as a file or in memory), modify it as required and then convert JSON back into the binary format.

      The user? Aren't you writing a program to do this?

      Or does the program convert binary to JSON; present the JSON for interactive modification; and then when the user's finished; convert their modified JSON back to binary and rewrite the file in-place?

      It seems to me that most of your envisioned potential problems would go away if you split the process into 3 separate processes.

      1. Slurp, parse, spit out JSON to a (protected) file.
      2. Modify JSON; spit out to modified JSON file.
      3. Slurp modified JSON and spit out modified binary.
      4. Rename modified binary over the original.

        What to do if the original has changed in the intervening period is a 'production processes' problem. Ie. Managerial not technological.

      I'm reading a file. It's always possible someone is doing something stupif(1) and modifying / replacing the file while the tool is parsing it. A big fat "DON'T DO THAT" sticker is probably sufficient, but it would be nice to limit the exposure. Another way could be to calculate the file size / date / md5sum before and after parsing and repeat the parse if it changed unexpectedly.

      You seem to be looking for complicated solutions to "shouldn't happen" possibilities.

      A simpler solution would be to move (rename) the file to a 'this user only' permissions directory; or change the permissions on the file to 'this user only'; or investigate use mandatory locking if that's available on your platform.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
      I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!

        I seem to be unable to convey my intention. I already did what you suggested. My library can be used to convert the binary data into a JSON representation (parse) and also to convert a JSON-representation into binary (serialize). These are 2 separate process steps that do not need to be executed back-to-back.

        I'm not trying to figure out how to prevent concurrent access. I'm just not sure whether it's better to access the binary data as a simple file handle, as a scalar or as a memory-backed file handle.

        But I may already have my answer: Since I'm not developing for embedded devices I'm not memory-constrained and should just load the full file into memory. (Option 'b' or 'c') Since I'm already using read, seek, tell, print to access the bits and bytes I should then go for Option 'c'.