comment on

If I wanted to produce the following result

Header stuff
123456|987|12
Apples|9
Oranges|19
Bananas|4
Footer junk
Header stuff
123456|987|34
Apples|7
Oranges|15
Bananas|11
Footer junk
Header stuff
123456|987|56
Apples|3
Oranges|9
Bananas|8
Footer junk
[download]

from the two input files fake1.dat

Header stuff
123456|987|12
Apples|4
Oranges|12
Bananas|3
Footer junk
Header stuff
123456|987|34
Apples|5
Oranges|7
Bananas|8
Footer junk
Header stuff
123456|987|56
Apples|2
Oranges|1
Bananas|3
Footer junk
[download]

and fake2.dat

Header stuff
123456|987|12
Apples|5
Oranges|7
Bananas|1
Footer junk
Header stuff
123456|987|34
Apples|2
Oranges|8
Bananas|3
Footer junk
Header stuff
123456|987|56
Apples|1
Oranges|8
Bananas|5
Footer junk
[download]

I would probably write a script like this to do it:

#!/usr/bin/perl -w

use strict;

my %data;

{
    #  Go looking for files that match this pattern.

    foreach my $thisFile (glob("fake?.dat")) {

      #  Open the file, and die if that doesn't work.

      open ( INPUT, $thisFile ) or
        die "Unable to open $thisFile: $!";

      my ( $header, $id, @data, $footer );
      while (<INPUT>) {

        #  Read in a line from the file. We're expecting
        #  a header, an ID line, followed by a bunch of
        #  lines of data, terminated by a footer. There
        #  can be several of these records in a file. For 
        #  the sake of simplicity, we assume that the
        #  lines of data are always present and always in
        #  the same order.

        chomp;
        if ( defined ( $header ) ) {

          if ( defined ( $id ) ) {

            if ( /Footer/ ) {

              #  If we just saw a footer, that's the end
              #  of a record and we can process what we 
              #  have now.

              $footer = $_;

              #  The unique ID number is the last number
              #  on the ID line.

              my ( $id3 ) = $id =~ m/\|(\d+)$/;

              #  Store this record's information into a
              #  hash, either re-using the existing hash
              #  element, or creating a new one.

              if ( exists($data{ $id3 }) ) {

                my @updatedData;
                foreach ( @{$data{ $id3 }->{data}} ) {

                  my @dataSoFar = split(/\|/, $_);
                  my @thisData  = split(/\|/,shift @data);

                  $dataSoFar[1] += $thisData[1];
                  push ( @updatedData, join('|', @dataSoFar) );
                }
                $data{ $id3 }->{data} = \@updatedData;
                
              } else {

                $data{ $id3 }->{header} = $header;
                $data{ $id3 }->{id}     = $id;
                push ( @{$data{ $id3 }->{data}}, @data );
                $data{ $id3 }->{footer} = $footer;
              }
      
              #  Clear variables for next loop around the 
              #  input file.

              undef $header;
              undef $id;
              @data = ();
              undef $footer;

            } else {
            
              push ( @data, $_ );
            }

          } else {

            $id = $_;
          }
        } else {

          $header = $_;
        }

      }
      close ( INPUT );
    }

    #  Having added up the various lines of data, we now 
    #  dump out a summary.

    foreach my $thisKey ( sort keys %data ) {

      print "$data{ $thisKey }->{'header'}\n";
      print "$data{ $thisKey }->{'id'}\n";
      foreach ( @{$data{ $thisKey }->{'data'}} ) {
        print "$_\n";
      }
      print "$data{ $thisKey }->{'footer'}\n";
    }
}
[download]

See if that helps you.

Alex / talexb / Toronto

"Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

In reply to Re: Adding object identifiers corresponding to matched headers and sub-headers. by talexb
in thread Adding object identifiers corresponding to matched headers and sub-headers. by Kiran Kumar K V N

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.