in reply to Re: tie multiple files to a single array?
in thread tie multiple files to a single array?

Yeah, i need random access to the file contents. I will know in advance which lines I will need, but I am thinking now that I should try this: 1) Figure out which rows I need. 2) Figure out the number of lines in each file 3) Build a module that wraps Tie::File, but switches the tied filehandle for the appropriate row number. 4) close everything up. Thanks for the help!
  • Comment on Re: Re: tie multiple files to a single array?

Replies are listed 'Best First'.
Re: Re: Re: tie multiple files to a single array?
by BrowserUk (Patriarch) on Jul 08, 2003 at 00:09 UTC

    Tie::File is great module for the purpose for which it was designed, essentially in-place editing of huge files, but the very features that make it so useful for that are likely to get in the way and slow your application down. That your files (being on CD-ROM) are read-only, justs means all the clever code in there for caching and deffered writing would be redundant.

    Of course, creating this index only makes sense if your going to need to use it more than once, and that brings me back to the final point I made in my last post. Deciding which of the many possibilities, is the 'best' approach to solving this problem really requires a good description of how the application is going to acccess the files, and how frequently. These are a few questions I would ask myself before I decided which way to do this.

    • How often will the application run?

      If it will only run once, there little point in making an index, nor worrying about efficiency.

    • How important is a timely result?

      If the application will run intereactively, with a user (or another machine) sitting there waiting for the result, then re-indexing (essentially what Tie::File would have to do) 8GB of data each time it runs would be incredibly wasteful and interminably slow.

    • How random does the random access need to be?

      If you have a list of record numbers that you need to extract, and there are no dependancies between them, then sorting that list, and processing the files sequentially, counting and extracting the records as you go is probably about as efficient as it likely to get if you only need to do it once.

      Conversely, if you need to access the records in a random order and especially if there are dependancies and/or the order can vary at runtime, then it would probably make sense to build an index to the records.

      If your going to process the files more than once, it definitely would.

    There are also various ways that you could build the index, with the usual trade-offs between size and speed applying.

    Without greater insights to the nature of application it's pointless speculating further, but given the composite size and read-only nature of the files involved, and the need to wrap code around Tie::File to achieve your purpose, I'm pretty sure that there is a better way to go than that.

    Good luck.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller