chinamox has asked for the wisdom of the Perl Monks concerning the following question:

Hello all,

I have been reading up on IO::File and I wanted to see if anyone could post or point me towards an example of a program that uses IO::File to split a large file into several smaller files with a set number of lines each.

For example lets say my original file $file has 3200 lines in it. I want to use IO::File to make four new files (titled $file1, $file2, $file3, $file4) and write them in my current directory.

What I have figured out from docs so far:

#!/usr/local/bin/perl -w use strict; use IO::File; #Open and read In File open(IN, '/myfiles/test/file')||die "Can not open '/myfiles/test/file' + :$!\n" while(<IN>){ }; close (IN); #Open file for writing open(OUT, '>', $file)||die "Can not open $file :$!\n"; print OUT "...800Lines of output..."; close(OUT);

As always, thank you for your time and help.

-mox

Replies are listed 'Best First'.
Re: IO::File Question
by ikegami (Patriarch) on Oct 23, 2006 at 04:37 UTC

    Since this is part of an assignment, why don't I just give the pseudocode.

    • Open the input file (using open or IO::File).
    • While we haven't reached the end of the input file,
      • Read a line in.
      • If we haven't opened an output file yet, or if we've read MAX lines, open a new output file.
      • Write the line out.
Re: IO::File Question
by graff (Chancellor) on Oct 23, 2006 at 03:30 UTC
    If you have any sort of unix system (or any version of unix tools for windows), look up the shell command called "split" (not the perl function of the same name). The unix "split" command does what you want. (And there is a perl version of that already available here: Perl Power Tools.)

    If you want to reinvent the split command again for yourself, you don't really need IO::File for that. In its most basic form, it's just a matter of opening "file.name", for input, opening "file.name.1" for output, then reading and writing a line at a time, incrementing a line counter as you go. When you've done N lines, close "file.name.1", open "file.name.2" for output, reset your line counter, and continue. When there's no more input, you're done.

    update: Looking at your question again, I wonder if maybe I missed your point, but for the task you describe, it strikes me that the approach you're trying to take is ill-advised at best, because it involves loading all of a potentially large file into memory at once, and that is completely unnecessary for this task.

    Perhaps you have some other reason for wanting to learn how to use IO::File, and if so, tell us about that. But don't implement file splitting this way (in the manner suggested in the OP).

      Thank you for your response.

      This little problem is part of a larger class assignment. Having already looked at the UNIX man page for ‘split’ can definitely I see your point. However I think the idea is more about me to learning about IO::File than actually finding the most efficient way of doing this particular problem and so don’t want to risk being too clever for my own good here.

      I realize that this maybe inefficient, but I think your idea of using a line counter is very good. Would you need one or two counters? I know you would need to have one to count the number of lines written to each file (800 in this example) but would you not also want a counter to keep track of which new output file you are writing to and then pasting that on the end of the new file (4 would be made in this example) name?

      -mox
        You don't actually need a counter, since Perl already provides one: $.
        my $max_lines = 800; if (($. - 1) % $max_lines == 0) { my $file_num = int(($. - 1) / $max_lines) + 1; my $file_name = "file${file_num}.txt"; open($fh_out, '>', $file_name) or die("Unable to create file \"$file_name\": $!\n"); }
        I know you would need to have one to count the number of lines written to each file (800 in this example) but would you not also want a counter to keep track of which new output file you are writing to and then pasting that on the end of the new file (4 would be made in this example) name?

        Yes, you'd need to keep a counter that you increment each time you open a new output file, as well as a counter that you increment each time you write to the current output file (and that gets reset to zero each time you open a new file).

        And you can still use IO::File to manage the opening and closing of files, and use lexically scoped scalar variables as "file handle objects".