jalewis2 has asked for the wisdom of the Perl Monks concerning the following question:

This isn't a perl specific question, but quickly processing files is a FAQ.

I am currently gunzipping text files to a ramfs and processing them line by line. My thinking was that if I was reading the files faster, I could process them faster.

Are there any gotchas to doing this?

Replies are listed 'Best First'.
Re: Using ramfs to process files
by Tanktalus (Canon) on Feb 03, 2005 at 17:51 UTC

    Ignoring Compress::Zlib which could do this in a pipe-like fashion, you could also just pipe from gunzip:

    open my $fh, "gunzip -c $filename |"; while (<$fh>) { # work with one line at a time here }
    Use less memory, too.

      Sometimes when I post, it makes sense in my head.

      My files are gzipped binary files. I have to send them through another program to get them into the text format. The issue is that after processing they are very large and it is easier to break my task into mutiple scripts. Keeping the file in memory is cumbersome.

      I haven't had time to write something to process the binary format directly, but that is probably next on the list.

      My question was more about possible problems with using ramfs.

        I'm going to pretend for a second that you're in control of the conversion program, or that the conversion program is a well-thought-out program.

        In the ideal world, you'd simply extend the pipe:

        open my $fh, "gunzip -c $filename | convert_to_text - |";

        Thus, gunzip would uncompress as much as it could before getting blocked on a pipe to convert_to_text which would read the binary from stdin, and write the text to stdout, both of which would block when empty/full, respectively, and your script could deal with the text on the way through.

        Back to reality ... in the more common case, I see nothing wrong with ramfs - as long as it works and doesn't corrupt your data, same as any other filesystem. You just have to be careful that you're not wasting ram on stuff that could make your system faster in other ways. For example, if that ram were instead used for physical filesystem caching, you'd get almost the same effect, and ram that wasn't in use for caching (vs ramfs) could still be used for your programs.

Re: Using ramfs to process files
by TomDLux (Vicar) on Feb 03, 2005 at 18:49 UTC

    Instead of using a ramfs to store large blocks of data, why not get rid of the external process running gzip. Instead use PerlIO::gzip and process the file line by line. According to the POD:

    use PerlIO::gzip; open FOO, "<:gzip", "file.gz" or die $!; print while <FOO>; # And it will be uncompressed... binmode FOO, ":gzip(none)" # Starts reading deflate stream from here o +n

    --
    TTTATCGGTCGTTATATAGATGTTTGCA

Re: Using ramfs to process files
by halley (Prior) on Feb 03, 2005 at 18:40 UTC
    Ramdisks (and ramfs) are a mixed blessing.

    Whatever RAM you dedicate to the ramdisk, you are consuming from the pool. Usually, an operating system will use whatever is in the pool as disk buffers. So you're trading one kind of performance booster for another performance booster.

    Also, as bluto brought up, a ramfs dies when the power goes out. Journals won't save a ramdisk.

    However, if you're comparing ramfs vs nfs or a filesystem on a journaled laptop hard drive, and you are using the ramfs only for specific applications that need tons of write operations, and you don't mind the lack of safety, I'd say you could see some real benefit.

    --
    [ e d @ h a l l e y . c c ]

Re: Using ramfs to process files
by bluto (Curate) on Feb 03, 2005 at 18:30 UTC
    I'm assuming you have memory to burn and don't care about power failures. You'll need to consider if your files' sizes could eventually scale past the point where they won't fit on your ramfs since it may be a pain to rectify this when it actually happens (e.g. you may have to add more ram, fix code, or move it to a disk fs). Obviously, handling this with a disk fs is easier since you can give it a large amount of space ahead of time.

    Also, sometimes you'll see little/no significant benefit over using a fast local disk filesystem. Ram filesystems still have to simulate creates/reads/writes/unlinks/etc and, depending on your access pattern, disk filesystem caching can make huge differences.

      Yes, Tons of RAM, quad CPU. The bottleneck seems to be disks, which I can't quickly fix.