scale of large data file and output them in certain format

fseng has asked for the wisdom of the Perl Monks concerning the following question:

I have two files, master.txt, contains a series of CRLF records as follows:

0001Hadrian 
0002Claudius 
0003Julian 
0004Augustus 
0005Severus 
0006Constantine 
0007Valentinian
[download]

The second file, child.txt,

0001Legio IX 
0001Legio VI 
0001Ursa Minor 
0002Veraculum 
0002Thrace 
0002Legio III 
0003Persia 
0003Londinium 
0004Legio II 
0005Caledonia 
0005York 
0006Christianity 
0006Constantinople 
0006Legio II 
0007Julian 
0007Legio VII
[download]

I want to write a program that prints output shown below. The link between the two files is the four byte field at the start of the records in both sets. the program has to meet: 1. Your program should scale for files of any size, that is, if both files had 200,000 records your program should not run out of memory! Speed is not a requirement, but you should comment on the performance. 2. There are two parts to the program: the data merging and the presentation. Keep these parts clearly separated in your code. Output format For example, the first page should look like this:

                  <Master>(in the middle of the page)
<child>(line up to the left
<child>
<child>
<child>
<child>
<child>
<child>
<child>
<child>
<child> &#56256;&#56517; up to 15
<child> children per page
                    Page<page no>(middle of page)
a example could be:
                    Hadrian
Legio IX
Legio VI
Ursa Minor
Page 1
13
                   Page 1
[download]

My code:

#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;

my $data1 = "master.txt";
my ( $number, %accounts );
open (F1, $data1) or die "\"$data1\" not existed or can't be opened!\n
+";

my $data2 = "child.txt";
my ( $child_number, %children );
open (F2, $data2) or die "\"$data2\" not existed or can't be opened!\n
+";
my ($master,%result);
my $result;
while ( <F1> ) {
        chomp;
        ($number, my @fields ) = split /(\d\d\d\d)/, $_, -1;
        @{ $accounts{ $number } }{ qw/ number name / } = @fields;
 }    
while ( <F2> ) {
    chomp;
        ($child_number, my @fields ) = split /(\d\d\d\d)/, $_, -1;
      @{ $children{ $child_number } }{ qw/ child_number info / } = @fi
+elds;
  #print Dumper  $children{ $child_number } ;
if ($accounts{ $number }{'number'} == $children{ $child_number }{'chil
+d_number'}){
         print $accounts{ $number }{'name'}, "\n";
         foreach $child_number (%children){
             print $children{ $child_number }{'info'}, "\n";
    }
    }
  
}
[download]

How should I do the loop? I'm here trying to scale the master file once first e.g. I got 0001 then I scale all the file in child.txt if anything match 0001 then I want to take the info out and push them with the master name together for the output. Somehow this only give me the last result of "Valentinian"and some error messages.I'm really confused now.

Comment on scale of large data file and output them in certain format Select or Download Code

Replies are listed 'Best First'.
Re: scale of large data file and output them in certain format by jwkrahn (Abbot) on Jul 02, 2009 at 03:29 UTC
Perhaps this will help get you started: #!/usr/bin/perl use warnings; use strict; use Data::Dumper; my $data1 = 'master.txt'; my $data2 = 'child.txt'; open my $F1, '<', $data1 or die qq["$data1" can\'t be opened: $!]; open my $F2, '<', $data2 or die qq["$data2" can\'t be opened: $!]; my %accounts; while ( <$F1> ) { my ( $number, $name ) = unpack 'A4 A', $_; $accounts{ $number }{ master } = $name; } while ( <$F2> ) { my ( $number, $child ) = unpack 'A4 A', $_; $accounts{ $number }{ child } .= "$child\n"; } my $page = 1; for my $key ( sort keys %accounts ) { print ' ' x 20, $accounts{ $key }{ master }, "\n"; print $accounts{ $key }{ child }; print ' ' x 20, "Page ", $page++, "\n\n"; } [download]	[reply] [d/l]
Re: scale of large data file and output them in certain format by jhourcle (Prior) on Jul 02, 2009 at 05:12 UTC
... the program has to meet: 1. Your program should scale for files of any size, that is, if both files had 200,000 records your program should not run out of memory! Speed is not a requirement, but you should comment on the performance. 2. There are two parts to the program: the data merging and the presentation. Keep these parts clearly separated in your code. Is it homework season? You might want to read the FAQ.	[reply]
Re: scale of large data file and output them in certain format by bichonfrise74 (Vicar) on Jul 02, 2009 at 02:11 UTC
Not sure if this can help you... #!/usr/bin/perl use strict; my $file_1 = <<EOF; 0001Hadrian 0002Claudius 0003Julian 0004Augustus 0005Severus 0006Constantine 0007Valentinian EOF my (%record, $count_line); my $page_number = 1; open( my $fh, "<", \$file_1 ) or die "Could not open file.\n"; while (<$fh>) { chomp; s/(\d{4})//; $record{$1}->{$_}->{'Master'}++; } close( $fh ); while (<DATA>) { chomp; s/(\d{4})//; $record{$1}->{$_}->{'Child'}++; } for my $i ( sort keys %record ) { for my $j ( sort keys %{ $record{$i} } ) { if ( defined( $record{$i}->{$j}->{'Master'} ) ) { print "\t\t\t$j\n"; } else { print "$j\n"; } $count_line++; if ( $count_line == 5 ) { print "\n\t\tPage is $page_number\n\n"; $page_number++; $count_line = 0; } } } __DATA__ 0001Legio IX 0001Legio VI 0001Ursa Minor 0002Veraculum 0002Thrace 0002Legio III 0003Persia 0003Londinium 0004Legio II 0005Caledonia 0005York 0006Christianity 0006Constantinople 0006Legio II 0007Julian 0007Legio VII [download]	[reply] [d/l]
Re: scale of large data file and output them in certain format by fseng (Novice) on Jul 02, 2009 at 06:03 UTC
Thanks guys. But how do I apply the "15 children per page" rule?	[reply]
Re^2: scale of large data file and output them in certain format by Anonymous Monk on Jul 02, 2009 at 06:41 UTC
What is the problem you're having with it? Maybe use Data::Page	[reply]