fseng has asked for the wisdom of the Perl Monks concerning the following question:

I have two files, master.txt, contains a series of CRLF records as follows:
0001Hadrian 0002Claudius 0003Julian 0004Augustus 0005Severus 0006Constantine 0007Valentinian
The second file, child.txt,
0001Legio IX 0001Legio VI 0001Ursa Minor 0002Veraculum 0002Thrace 0002Legio III 0003Persia 0003Londinium 0004Legio II 0005Caledonia 0005York 0006Christianity 0006Constantinople 0006Legio II 0007Julian 0007Legio VII
I want to write a program that prints output shown below. The link between the two files is the four byte field at the start of the records in both sets. the program has to meet: 1. Your program should scale for files of any size, that is, if both files had 200,000 records your program should not run out of memory! Speed is not a requirement, but you should comment on the performance. 2. There are two parts to the program: the data merging and the presentation. Keep these parts clearly separated in your code. Output format For example, the first page should look like this:
<Master>(in the middle of the page) <child>(line up to the left <child> <child> <child> <child> <child> <child> <child> <child> <child> &#56256;&#56517; up to 15 <child> children per page Page<page no>(middle of page) a example could be: Hadrian Legio IX Legio VI Ursa Minor Page 1 13 Page 1
My code:
#!/usr/bin/perl use warnings; use strict; use Data::Dumper; my $data1 = "master.txt"; my ( $number, %accounts ); open (F1, $data1) or die "\"$data1\" not existed or can't be opened!\n +"; my $data2 = "child.txt"; my ( $child_number, %children ); open (F2, $data2) or die "\"$data2\" not existed or can't be opened!\n +"; my ($master,%result); my $result; while ( <F1> ) { chomp; ($number, my @fields ) = split /(\d\d\d\d)/, $_, -1; @{ $accounts{ $number } }{ qw/ number name / } = @fields; } while ( <F2> ) { chomp; ($child_number, my @fields ) = split /(\d\d\d\d)/, $_, -1; @{ $children{ $child_number } }{ qw/ child_number info / } = @fi +elds; #print Dumper $children{ $child_number } ; if ($accounts{ $number }{'number'} == $children{ $child_number }{'chil +d_number'}){ print $accounts{ $number }{'name'}, "\n"; foreach $child_number (%children){ print $children{ $child_number }{'info'}, "\n"; } } }
How should I do the loop? I'm here trying to scale the master file once first e.g. I got 0001 then I scale all the file in child.txt if anything match 0001 then I want to take the info out and push them with the master name together for the output. Somehow this only give me the last result of "Valentinian"and some error messages.I'm really confused now.

Replies are listed 'Best First'.
Re: scale of large data file and output them in certain format
by jwkrahn (Abbot) on Jul 02, 2009 at 03:29 UTC

    Perhaps this will help get you started:

    #!/usr/bin/perl use warnings; use strict; use Data::Dumper; my $data1 = 'master.txt'; my $data2 = 'child.txt'; open my $F1, '<', $data1 or die qq["$data1" can\'t be opened: $!]; open my $F2, '<', $data2 or die qq["$data2" can\'t be opened: $!]; my %accounts; while ( <$F1> ) { my ( $number, $name ) = unpack 'A4 A*', $_; $accounts{ $number }{ master } = $name; } while ( <$F2> ) { my ( $number, $child ) = unpack 'A4 A*', $_; $accounts{ $number }{ child } .= "$child\n"; } my $page = 1; for my $key ( sort keys %accounts ) { print ' ' x 20, $accounts{ $key }{ master }, "\n"; print $accounts{ $key }{ child }; print ' ' x 20, "Page ", $page++, "\n\n"; }
Re: scale of large data file and output them in certain format
by jhourcle (Prior) on Jul 02, 2009 at 05:12 UTC
    ... the program has to meet: 1. Your program should scale for files of any size, that is, if both files had 200,000 records your program should not run out of memory! Speed is not a requirement, but you should comment on the performance. 2. There are two parts to the program: the data merging and the presentation. Keep these parts clearly separated in your code.

    Is it homework season? You might want to read the FAQ.

Re: scale of large data file and output them in certain format
by bichonfrise74 (Vicar) on Jul 02, 2009 at 02:11 UTC
    Not sure if this can help you...
    #!/usr/bin/perl use strict; my $file_1 = <<EOF; 0001Hadrian 0002Claudius 0003Julian 0004Augustus 0005Severus 0006Constantine 0007Valentinian EOF my (%record, $count_line); my $page_number = 1; open( my $fh, "<", \$file_1 ) or die "Could not open file.\n"; while (<$fh>) { chomp; s/(\d{4})//; $record{$1}->{$_}->{'Master'}++; } close( $fh ); while (<DATA>) { chomp; s/(\d{4})//; $record{$1}->{$_}->{'Child'}++; } for my $i ( sort keys %record ) { for my $j ( sort keys %{ $record{$i} } ) { if ( defined( $record{$i}->{$j}->{'Master'} ) ) { print "\t\t\t$j\n"; } else { print "$j\n"; } $count_line++; if ( $count_line == 5 ) { print "\n\t\tPage is $page_number\n\n"; $page_number++; $count_line = 0; } } } __DATA__ 0001Legio IX 0001Legio VI 0001Ursa Minor 0002Veraculum 0002Thrace 0002Legio III 0003Persia 0003Londinium 0004Legio II 0005Caledonia 0005York 0006Christianity 0006Constantinople 0006Legio II 0007Julian 0007Legio VII
Re: scale of large data file and output them in certain format
by fseng (Novice) on Jul 02, 2009 at 06:03 UTC
    Thanks guys. But how do I apply the "15 children per page" rule?
      What is the problem you're having with it? Maybe use Data::Page