Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks!
Does any one knows how to go through a directory of *.txt file(s) and check if the files found are larger than 1MB, if they, are cut the file(s) into part(s), like, if a file(s) is 3.5MB, cut it in 3 parts of 1MB each, rename each one of them and let the remaining .5MB to its currently name until it reaches 1MB, and than archive it again. I just don't even know how to start this one.
Thanks for showing me the way in advance!

Replies are listed 'Best First'.
Re: Text File Size
by tirwhan (Abbot) on Jan 31, 2006 at 19:13 UTC

    Not a Perl answer, but the split utility (*NIX) is really handy for this type of thing.

    split -b 1M <filename>

    Will split the file the way you describe, appending "aa","ab", etc. to the resulting files. Now you just have to find the last of these and rename it to the original filename to get your desired behaviour. You should take care though, any file descriptors pointing to the original file will be clobbered, so if these are logfiles you probably need to HUP the logging processes.


    There are ten types of people: those that understand binary and those that don't.
Re: Text File Size
by davido (Cardinal) on Jan 31, 2006 at 19:58 UTC

    You could open the directory with opendir and read it with readdir. Iterate over the names returned by readdir. Use the -f operator to make sure you've got a file. Using the -s operator, you could determine the file size. Set $/ to \1048576. Open the too-big file. Start up a while loop, reading through the file. For each "record", open an output file and print out that record, then close that output file, and move onto the next file in the directory. Use a counter to generate unique splitfile names.

    Update:
    A few hours later I had a moment to put it to code:

    use strict; use warnings; my $path = "c:/path/to/files"; my $maxsize = 1048576; opendir my $dirhandle, $path or die $!; foreach my $file ( readdir $dirhandle ) { next unless -f $file; next unless -s $file > $maxsize; my $count = 0; open my $in, '<', "$path/$file" or die $!; { local $/ = \$maxsize; while ( my $record = <$in> ) { my $outname = "$file.split" . sprintf "%03s", $count++; open my $out, '>', "$path/$outname" or die $!; print $out $record; close $out or die $!; } } close $in; }

    Dave

Re: Text File Size
by blazar (Canon) on Feb 01, 2006 at 11:14 UTC
    Does any one knows how to go through a directory of *.txt file(s)

    glob or if you need to recurse, File::Find (or File::Finder or File::Find::Rule). Or if you like to reinvent the wheel, then opendir & C. (IMHO often cargo culted, but that's just my MHO...)

    check if the files found are larger than 1MB

    -s

    if they, are cut the file(s) into part(s)

    Check perldoc perlfunc for "Functions for filehandles, files, or directories" and "Input and output functions". In particular I see two possibilities, fundamentally:

    1. use sysread;
    2. put a reference into $/, e.g. $/=\0x1000_000;

    try putting everything together and if you have problems then show us your code.

Re: Text File Size
by leocharre (Priest) on Jan 31, 2006 at 22:28 UTC

    this may help...

    to find the actual files.. and get some data on them..

    for ( split( /\n/, `find /your/path/to/txtfiles -maxdepth 1 -size +102 +4k -printf "%p,%b\n"`) ){ $file= $_[0]; $sizeink= $_[1]; }

    this is goofied up.. do a 'man find' - and a lot of people hate backticks.. it's bad with any tainted data stuffs