Monk::Thomas has asked for the wisdom of the Perl Monks concerning the following question:

Hello fellow monks

I'm having trouble to decide for a file access method. I'm trying to describe the candidates that come to mind and their pros and cons I can think of. I'd like your input to pick the most appropriate one.

Requirements

Additional limitation: My parser is driven via a user-provided syntax description. I haven't found a way to describe a packed byte as a single value, they are handled as separate values. Therefore the parsing/writing requires multiple passes over the same byte. The data stream MUST either be able to return to a previous position or the current byte's content must be cached for a subsequent read.

Option a) file handle

open my $fh, '<', $filename;

PRO: works perfectly fine when converting from binary to parsed, 'natural' way to handle a file, file size does not need to have an impact on memory consumption

CON: converting from parsed to binary is better done in memory because it's easier to build the data from back-to-front; must protect file against change during parse/write

Option b) load data into scalar

local $/; my $data = <$fh>;

PRO: drastically reduce the time while parser is vulnerable to file change; does not matter where data is read/written.

CON: Must rely on substr to extract values from byte stream and/or manually track position

Option c) memory backed file handle

open my $fh, '<', \$data;

PRO: keep file access methods, while also lessen vulnerability to file change

CON: file must be read completely into memory even if just a small number of records are parsed and most of them are simply skipped

Which option is the most reasonable to you? Is there another option I am not aware of?

Update: crossed out remarks regarding file access concurrency, they are distracting
  • Comment on Deciding for a file access method - requesting opinions

Replies are listed 'Best First'.
Re: Deciding for a file access method - requesting opinions
by BrowserUk (Patriarch) on Jul 14, 2015 at 20:49 UTC

    It's really hard to imagine the type of processing you are doing on this file.

    The phrases "binary to parsed" & "parsed to binary" aren't ones I've ever encountered before.

    Do they indicate that you reading information from the file; manipulating it and then re-writing it in place with new values?

    And this "modifying the file during a parse / write is guaranteed to lead to inconsistency" indicates that there may be multiple concurrent accessors?


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
    I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!

      Do they indicate that you reading information from the file; manipulating it and then re-writing it in place with new values?

      The user is supposed to convert the binary data into a JSON-representation (either as a file or in memory), modify it as required and then convert JSON back into the binary format.

      And this "modifying the file during a parse / write is guaranteed to lead to inconsistency" indicates that there may be multiple concurrent accessors?

      I'm reading a file. It's always possible someone is doing something stupif(1) and modifying / replacing the file while the tool is parsing it. A big fat "DON'T DO THAT" sticker is probably sufficient, but it would be nice to limit the exposure. Another way could be to calculate the file size / date / md5sum before and after parsing and repeat the parse if it changed unexpectedly.

      (1) Nice typo.

      P.S.: I already talked about the data format in a previous node

        The user is supposed to convert the binary data into a JSON-representation (either as a file or in memory), modify it as required and then convert JSON back into the binary format.

        The user? Aren't you writing a program to do this?

        Or does the program convert binary to JSON; present the JSON for interactive modification; and then when the user's finished; convert their modified JSON back to binary and rewrite the file in-place?

        It seems to me that most of your envisioned potential problems would go away if you split the process into 3 separate processes.

        1. Slurp, parse, spit out JSON to a (protected) file.
        2. Modify JSON; spit out to modified JSON file.
        3. Slurp modified JSON and spit out modified binary.
        4. Rename modified binary over the original.

          What to do if the original has changed in the intervening period is a 'production processes' problem. Ie. Managerial not technological.

        I'm reading a file. It's always possible someone is doing something stupif(1) and modifying / replacing the file while the tool is parsing it. A big fat "DON'T DO THAT" sticker is probably sufficient, but it would be nice to limit the exposure. Another way could be to calculate the file size / date / md5sum before and after parsing and repeat the parse if it changed unexpectedly.

        You seem to be looking for complicated solutions to "shouldn't happen" possibilities.

        A simpler solution would be to move (rename) the file to a 'this user only' permissions directory; or change the permissions on the file to 'this user only'; or investigate use mandatory locking if that's available on your platform.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
        I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!
A reply falls below the community's threshold of quality. You may see it by logging in.