Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have a tough problem and would greatly appreciate any ideas anyone has.

On a research ship at sea we have a system that logs scientific data from a myriad of serial lines and writes each to individual text files within the data type's directory. A new file is created for each day. So the system looks like this:

data/ sensor1/ sensor1-date.txt sensor1-date.txt sensor2/ sensor2-date.txt sensor2-date.txt etc.

At the end of a cruise leg, we need to transfer the entire data store to another system, and split it into chunks each of which will fit on a DVD. The total total data store size can be as large as 50Gb so, we have to write lots of DVDs - all in a short amount of time.

We can write a full DVD in about 15 minutes - so that's no problem. But transferring the data and splitting it up into chunks takes many hours. This is the crux of the problem.

I want to write a script that can be run daily and will incrementally transfer new data to the DVD staging area. This much is easy - I can use rsync. However, I need it to also automatically write the data to a new place when the size of the current location nears that of a DVD. In this way, at the end of a cruise, the data is all neatly sorted into DVD sized sections.

I'm just beginning to think about how to do this and would greatly appreciate any wisdom anyone has on how to go about it. Or even if anyone knows of any code that might make this easier.

Many thanks,

Val

Replies are listed 'Best First'.
Re: Creating incremental images of data
by kvale (Monsignor) on Aug 19, 2004 at 17:57 UTC
    Scientific sensor logs tend to be highly repetitive, and written in a text format would have many rendundant bits. I would not be surprised if bzip2 could compress these files by 80-90%. Compressing before transfer and burning would save much time and many DVDs.

    -Mark

Re: Creating incremental images of data
by Plankton (Vicar) on Aug 19, 2004 at 15:58 UTC
    Maybe you could use DAR for this.

    Plankton: 1% Evil, 99% Hot Gas.
Re: Creating incremental images of data
by revdiablo (Prior) on Aug 19, 2004 at 16:58 UTC
    Or even if anyone knows of any code that might make this easier.

    Many Linux systems (and perhaps other Unixes) have a command called split. It would work perfectly in this case.