Re: Splitting a Blocked file in Round Robin into smaller files
by Corion (Patriarch) on Dec 14, 2015 at 15:41 UTC
|
You use very confusing terminology, mixing "record", "line" and "block". Assuming that you consider "unit of work" boundary to be between a "block" 5 and a "block" 1, then why not simply use seek to seek to a position roughly ($file_size / $number_of_files) * $this_file and read forward until you've encountered one "block" 5? After that, the current set of "unit of work" starts.
| [reply] [d/l] |
|
|
I am using the below code, but its not working exactly how I would like it
#!/usr/bin/env perl
use strict;
use warnings;
my $num_files_to_write = 4;
use Data::Dumper;
my @filehandles;
for my $id ( 1..$num_files_to_write ) {
open ( my $fh, '>', "file_$id.txt" ) or die $!;
push @filehandles, $fh;
}
local $/ = '5';
while ( <> ) {
select $filehandles[$. % $num_files_to_write];
print;
}
foreach my $fh ( @filehandles ) {
close ( $fh );
}
| [reply] [d/l] |
|
|
| [reply] |
Re: Splitting a Blocked file in Round Robin into smaller files
by BrowserUk (Patriarch) on Dec 14, 2015 at 15:35 UTC
|
Are you saying that you want all the records starting with '1' in one file. All those starting with '2' in a second file. And all those starting with '3' in a third. And so on?
If not, you'll need to clarify your explanation because it is very confused.
Eg. What does "I need to split this large file into smaller 4 files by doing a round robin of each block(block 1 to 5)" mean?
Did you typo? Should that be "5 smaller files"?
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
|
|
I am sorry for the confusion. Lets say the actual file has 5 blocks as shown below(each end of block is distinguished by a record having its first character as '5'). Now, I will be dividing this actual file into 4 smaller files
Step 1: Go through the file, find the first record having the first character as '5' and copy until then to first file.
Step 2:Copy the next block until you get first character as '5' into second file and so on until the entire file is divided into 4 smaller files in round robin fashion
Actual File:
1this is block 1
2this is block 1
3this is block 1
4this is block 1
2this is block 1
3this is block 1
5this is block 1
1this is block 2
2this is block 2
3this is block 2
2this is block 2
3this is block 2
5this is block 2
1this is block 3
2this is block 3
5this is block 3
1this is block 4
2this is block 4
5this is block 4
1this is block 5
3this is block 5
5this is block 5
File1:
1this is block 1
2this is block 1
3this is block 1
4this is block 1
2this is block 1
3this is block 1
5this is block 1
1this is block 5
3this is block 5
5this is block 5
File 2:
1this is block 2
2this is block 2
3this is block 2
2this is block 2
3this is block 2
5this is block 2
File 3:
1this is block 3
2this is block 3
5this is block 3
File 4:
1this is block 4
2this is block 4
5this is block 4
| [reply] [d/l] [select] |
|
|
#! perl -sw
use strict;
my $file = $ARGV[0];
open I, '<', $file or die $!;
my @outs;
open $outs[ $_ ], '>', "$file.$_" or die $! for 1 .. 4;
my $out = 1;
while( <I> ) {
print { $outs[ $out ] } $_;
if( /^5/ ) {
++$out;
$out = 1 if $out > 4;
}
}
Call it as scriptname filename. The 4 output files will be named filename.1 filename.2 filename.3 filename.4
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
|
|
|
|
|
|
|
Re: Splitting a Blocked file in Round Robin into smaller files
by james28909 (Deacon) on Dec 14, 2015 at 16:12 UTC
|
Im not quite sure what you are wanting to end up with but try this:
use strict;
use warnings;
my @array = qw(
1nkndnfd
2nsnskdnsdn
3ddjsjd
4fksjsdj
5kdsjdskjdskj
1ksdjdjsk
hg
2dsjskj
3djkdjsljs
4fdkkjdskjsk
5sadjjdjdodjs
6sadjjdjdodjs
);
foreach (@array) {
my ( $num, $data ) = /(\d)(.*)/;
next if ( !length $num || !length $data || $num !~ /[1-5]/ );
open my $file, '+>>', "round_robin_$num" . ".txt";
print $file "$num - $data\n";
}
EDIT: Just noticed other replies, looks this is NOT what he/she was after. | [reply] [d/l] |
|
|
| [reply] |
|
|
Hey no problem man, I do try to help when I can. I atleast give ideas if anything else haha :)
| [reply] |
Re: Splitting a Blocked file in Round Robin into smaller files
by KurtSchwind (Chaplain) on Dec 14, 2015 at 16:08 UTC
|
I think he considers each 1,2,3,4,5 as a unit and wants 4 files with round-robin units in each.
#!/usr/bin/perl
use POSIX;
my @outfile = ('file1.txt', 'file2.txt', 'file3.txt' , 'file4.txt');
my $infile = 'infile.txt';
my $lineno = 0;
open (my $ifh, '<', $infile);
while (<$ifh>) {
my $out = $outfile[floor($lineno / 5)];
open (my $fh, '>>', $outfile[floor($lineno / 5)%4]) or die "Un
+able to open outfile";
print $fh $_;
close $fh;
$lineno++;
}
close ($ifh);
--
“For the Present is the point at which time touches eternity.” - CS Lewis
| [reply] [d/l] |
|
|
| [reply] [d/l] |
|
|
I'm in the habit of forcing a flush.
Would the flush happen regardless? If so, I've picked up something new.
--
“For the Present is the point at which time touches eternity.” - CS Lewis
| [reply] |
|
|
|
|
|
|