Splitting big file or merging specific files?

zarath has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

My question is actually about 2 separate scripts, but I'm trying to do 1 thing: making files that consists of 2 specific lines each.

I'm pretty new at Perl and this assignment is very different from what I have done with it in the past. I can't get either code to do exactly what I want, so I have no idea which code to keep working on to get to the solution fastest. Hope you guys can help me out.

I have a file with around 3000 lines in it. With this I need to create about 1500 separate files, each containing 2 lines. The lines in outputfile 1 should be line 1 and 2 from the inputfile; the lines in outputfile 2 should be line 3 and 4 from the inputfile and so on... So each line has to be written exactly once (no more, no less) and I have to be sure the order of the lines written is respected. The code I have written to work this way splits up the big file in small files, but it only writes 1 line to each file.

I have tried using File::CountLines to make the loop stop at 2 lines and open a new outputfile, but this only confused myself... the linecounter seemed to be stuck on either 0 or 1, writing the same line to an infinite number of files, or even worse: kept filling the same file in an endless loop, making the file too big to open very quickly. So I quickly threw this option out the window.

If this is the code I need to work on, this is it:

#!perl

use strict;
use warnings; 
use autodie; # die if problem reading or writing a file
use feature qw(say);

my $input = "C:/Some/specific/path/to/file.txt";

open FILEin, $input;
my @input = <FILEin>;
binmode(FILEin);
undef $/;

foreach $input (@input) {
    my $dir = 'C:/Some/specific/folder';
    my @files = <$dir/*>;
    my $countf = @files;
    my $outfile = $dir.'/bladibla_'.$countf.'.txt';
        open FILEout, '>'.$outfile;
        print FILEout $input;
        close FILEout;
        say 'Wrote '.$input.' to '.$outfile;
}
    
close FILEin;
[download]

The other code is an attempt at working the other way around.

I let the first code do what it does to end up with the 3000 files with each containing 1 line. The files contain a counter in the filename which is reliable, meaning: outputfile 1 should contain the contents of inputfile 0 and 1; outputfile 2 should contain the contents of inputfile 2 and 3 and so on...

The problem with the code I have written for this is that it doesn't seem to loop, it writes 1 file and then stops. The written file is correct (contains the contents of the first 2 inputfiles), but it should do this for all files found in the inputfolder. I know there is a problem with my 'for'-loop, but it seems I am not experienced enough to figure out what it should be.

If I better keep working on this code, here it is:

#!perl

use strict;
use warnings; 
use autodie; # die if problem reading or writing a file
use feature qw(say);

my $inbase = 'C:/Some/specific/folder';
my $outbase = 'C:/Some/other/specific/folder';
my $counter = @ARGV;

for my $file ($inbase.'/bladibla_'.$counter.'.txt') {

    my @outfiles = <$outbase/*>;
    my $countf = @outfiles;
    my $outfile = $outbase.'/bladibla_'.$countf.'.txt';
    
    open FILEout, '>'.$outfile;
    
    my $file = ($inbase.'/bladibla_'.$counter.'.txt');
        open FILEin, $file;
        my $contents = <FILEin>;
        binmode(FILEin);
    print FILEout $contents;
    close FILEin;
    
    say 'Wrote '.$contents.' to '.$outfile;
    
    $counter += 1;
    
    my $file2 = ($inbase.'/bladibla_'.$counter.'.txt');
        open FILEin, $file2;
        my $contents2 = <FILEin>;
        binmode(FILEin);
    print FILEout $contents2;
    close FILEin;
    
    say 'Wrote '.$contents2.' to '.$outfile;
    
    close FILEout;
    
    $counter += 1;
}
[download]

I'm sure both codes can be tweaked to do exactly what I need it to do, just pick the one you think needs the least/easiest work.

Thank you very much in advance!

Comment on Splitting big file or merging specific files? Select or Download Code

Replies are listed 'Best First'.
Re: Splitting big file or merging specific files? by prysmatik (Sexton) on Jun 29, 2017 at 18:25 UTC
I think you were traveling the right path with your first attempt there. But I'd like to invite you to rethink how you're collecting the information that you plan to process. `my @input = <FILEin>;` Collecting everything all at once. It's not necessary to consume the whole thing, since each part does not rely on another. I think if you were to `while( my $line = <FILEin>) {` That would make it clearer to see a solution. Maybe apply a switch to let you know if this is the first line, which requires opening a new destination and writing to it, or a second line, which requires writing to the destination and then closing it. `my $first_line_indicator=1; my $outfile_counter=0; while (my $line = <FILEin>){ if ($first_line_indicator==1){ $outfile_counter++; # open new write destination # write $line; $first_line_indicator=0; }else{ # write $line; #close write destination $first_line_indicator=1; } }` [download] I hope that points you in the right direction.	[reply] [d/l] [select]
Re: Splitting big file or merging specific files? by tybalt89 (Monsignor) on Jun 29, 2017 at 23:15 UTC
This one does both parts (along with creating the initial file). It uses module Path::Tiny - if you don't have it, install it from CPAN. #!/usr/bin/perl # http://perlmonks.org/?node_id=1193831 use strict; use warnings; use Path::Tiny; my $dir = 'folder'; my $input = 'bigfile.txt'; my $output = 'rebuiltfile.txt'; ################## create (fake) big file ################## path($dir)->mkpath; path("$dir/$input")->spew( map "line $_\n", 1..11 ); #################### split to 2 lines per file ############# my @lines = path("$dir/$input")->lines; my $number = () = glob "$dir/*"; path("$dir/bladibla_" . $number++ . '.txt')->spew(splice @lines, 0, 2) while @lines; #################### combine ############################### path("$dir/$output")->spew( map $_->[0]->lines, sort {$a->[1] <=> $b->[1]} map [ $_, "$_" =~ /(\d+)/ ], path($dir)->children( qr/^bladibla_\d+.txt$/ )); [download]	[reply] [d/l]
Re: Splitting big file or merging specific files? by GrandFather (Saint) on Jun 30, 2017 at 04:05 UTC
As a general thing only deal with the data you need to deal with immediately. In this case for phase one that means open your input file then while there is more data read a couple of lines and write them to the next output file. For phase 2 that means while there is another file read it and write its contents to your output. Note there isn't a "for" there anywhere. It's all "while something". Let's see how that coould look: #!usr/bin/perl use strict; use warnings; =pod Use this script as the input file to be cut up. We'll put the generate +d files into a throw away directory that is a sub-directory to the directory w +e are running the script from. This script creates the split files and the rejoined text. The rejoine +d text doesn't get saved, but is compared to the original script as a check t +hat everything worked. =cut my $subDir = './delme'; # Create a throw away sub-directory for the test. Wrapped in an eval b +ecause # we don't care if it fails (probably because the dir already exists). eval {mkdir $subDir}; seek DATA, 0, 0; # Set DATA to the start of this file my $origText = do {local $/; <DATA>}; # Slurp the script text to check + against seek DATA, 0, 0; # Back to the start again # Create the split files my $fileNum = 0; while (!eof DATA) { my $fileLines; $fileLines .= <DATA> for 1 .. 2; last if !defined $fileLines; ++$fileNum; open my $outFile, '>', "$subDir/outFile$fileNum.txt"; print $outFile $fileLines; close $outFile; } # Join the files back up again my $joinedText; $fileNum = 1; while (open my $fileIn, '<', "$subDir/outFile$fileNum.txt") { $joinedText .= do {local $/; <$fileIn>}; # Slurp the file ++$fileNum; } print "Saved and Loaded OK\n" if $joinedText = $origText; __DATA__ [download] The "slurp" bits set a Perl special variable to ignore line breaks so we can read an entire file in one hit. On modern systems with plenty of memory that works fine for files of hundreds of megabytes so it sould be fine for our toy example. The `for 1 .. 2` fetches 2 lines from the input file. If there is an odd number of lines in the input it doesn't matter - we end up concatenating undef to $fileLines which amounts to a no-op so no harm done. Premature optimization is the root of all job security	[reply] [d/l] [select]
Re: Splitting big file or merging specific files? by thanos1983 (Parson) on Jun 29, 2017 at 16:57 UTC
Hello zarath, Something like that could work. I am sure that another Monk will come up with a better solution but is a good working starting point. #!usr/bin/perl use say; use strict; use warnings; open(my $fh1, ">>", "output1.txt") or die "Can't open > output1.txt: $!"; open(my $fh2, ">>", "output2.txt") or die "Can't open > output2.txt: $!"; my $count = 1; while (<>) { # Read all files that provided through ARGV chomp; if ($count == 1 or $count == 2) { say $fh1 $_; } elsif ($count == 3 or $count == 4) { say $fh2 $_; } if ($count == 4) { $count = 1; } else { $count++; } } continue { close ARGV if eof; # reset $. each file } close($fh1) or warn "Can't close output1.txt: $!"; close($fh2) or warn "Can't close output2.txt: $!"; __END__ $ cat output1.txt Sample of text line 1 Sample of text line 2 Sample of text line 5 Sample of text line 6 $ cat output2.txt Sample of text line 3 Sample of text line 4 [download] The code is self explained, you provide the script `$ perl test.pl in.txt` and then the output files are defined inside. I open them in append mode so every line that we print is bellow the other. Update: Minor code modification: #!usr/bin/perl use say; use strict; use warnings; open(my $fh1, ">>", "output1.txt") or die "Can't open > output1.txt: $!"; open(my $fh2, ">>", "output2.txt") or die "Can't open > output2.txt: $!"; my $count = 1; while (<>) { # Read all files that provided through ARGV chomp; if ($count == 1 or $count == 2) { say $fh1 $_; } elsif ($count == 3 or $count == 4) { say $fh2 $_; } if ($count == 4) { $count = 1; next; } $count++; } continue { close ARGV if eof; # reset $. each file } close($fh1) or warn "Can't close output1.txt: $!"; close($fh2) or warn "Can't close output2.txt: $!"; [download] Update2: Minor code improvement: #!usr/bin/perl use say; use strict; use warnings; open(my $fh1, ">>", "output1.txt") or die "Can't open > output1.txt: $!"; open(my $fh2, ">>", "output2.txt") or die "Can't open > output2.txt: $!"; while (<>) { # Read all files that provided through ARGV chomp; if ($. == 1 or $. == 2) { say $fh1 $_; } elsif ($. == 3 or $. == 4) { say $fh2 $_; } $. = 0 if ($. == 4); } continue { close ARGV if eof; # reset $. each file } close($fh1) or warn "Can't close output1.txt: $!"; close($fh2) or warn "Can't close output2.txt: $!"; [download] Final Update: Just for fun removing unnecessary elsif: `#!usr/bin/perl use say; use strict; use warnings; open(my $fh1, ">>", "output1.txt") or die "Can't open > output1.txt: $!"; open(my $fh2, ">>", "output2.txt") or die "Can't open > output2.txt: $!"; while (<>) { # Read all files that provided through ARGV chomp; if ($. == 1 or $. == 2) { say $fh1 $_; next; } say $fh2 $_; $. = 0 if ($. == 4); } continue { close ARGV if eof; # reset $. each file } close($fh1) or warn "Can't close output1.txt: $!"; close($fh2) or warn "Can't close output2.txt: $!";` [download] Hope this helps, BR. Seeking for Perl wisdom...on the process of learning...not there...yet!	[reply] [d/l] [select]
Re: Splitting big file or merging specific files? by zarath (Beadle) on Jun 30, 2017 at 09:00 UTC
Thank you for the tips you guys! Got it to work now. In case anyone is interested, this is the code I ended up with: #!perl use strict; use warnings; use autodie; # die if problem reading or writing a file my $input = 'C:/Some/specific/path/to/inputfile.txt'; my $dir = 'C:/Some/specific/folder'; open FILEin, $input; binmode(FILEin); my $first_line_indicator = 1; while (my $line = <FILEin>){ if ($first_line_indicator==1){ my @files = <$dir/*>; my $countfiles = @files; my $outfile = $dir.'/bladibla_'.$countfiles.'.txt'; open FILEout, '>'.$outfile; print FILEout $line; $first_line_indicator=0; }else{ print FILEout $line; close FILEout; $first_line_indicator=1; } } close FILEin; [download]	[reply] [d/l]