comment on

Hi, bobdabuilda

You've given this much thought, and I think you're pseudocode is on target.

The orders are only separated by a blank line, but they all start wth the "Order ID:" text, so looking at using that as the separator.

The "Order ID:" as record separator makes sense.

The page header should be automatically filtered out by the regex the way it stands anyway... I think.

You're correct.

I've taken the liberty to implement an interpretation of this. It does use two loops, but the outer loop is a for loop that iterates over an array of Order records:

use strict;
use warnings;
use Data::Dumper;

# Place a filename into $recordsFile to read Orders from that file
#  else the Orders below __DATA__ will be used for demo purposes
my $recordsFile = '';

my ( @records, @orders );
my $recSeparator = 'Order ID:';

# Orders will initially be array elements 1 .. n in @orders; element 0
+ is initially the first page header
{
    # Set the record separator
    local $/ = $recSeparator;

    # If there's a file name, try to read from that file
    if ($recordsFile) {
        open my $fh, '<', $recordsFile or die $!;
        @records = <$fh>;
        close $fh;
    }
    else {
        @records = <DATA>;
    }
}

# Remove the first page header
shift @records;

# Add Order ID: back into each record for later matching
$_ = "$recSeparator$_" for @records;

# Iterate through each record (Order)
for my $record (@records) {
    my %hash;

    # Treat the record string like a file, opening it for reading
    open my $sh, '<', \$record or die "Unable to open record string: $
+!";

    # Read the string like a file, one line at a time now
    while (<$sh>) {
        $hash{orderID}        //= do { /Order ID:(\S+)/;        $1 };
        $hash{fiscalCycle}    //= do { /cycle:(\d+)/;           $1 };
        $hash{vendorID}       //= do { /Vendor ID:(\S+)/;       $1 };
        $hash{requisitionNum} //= do { /\s+(\d+).+requisition/; $1 };
        $hash{copies}         //= do { /copies:(\d+)/;          $1 };
        $hash{title}          //= do { /Title:(.+)/;            $1 };
        $hash{'ISBN/ISSN'}    //= do { m{ISBN/ISSN:(\S+)};      $1 };

        # Distributions started?
        if (/Distribution--/) {

            # Save the current record separator
            my $oldRecSeparator = $/;

            # Set a new record separator
            local $/ = 'Distribution--';

            # Read the string like a file, a distribution 'chunk' at a
+ time
            while (<$sh>) {
                my %tempHash;

                ( $tempHash{holdingCode} )  = /code:(\S+)/;
                ( $tempHash{copies} )       = /copies:(\d+)/;
                ( $tempHash{dateReceived} ) = /received:(\S+)/;
                ( $tempHash{dateLoaded} )   = /loaded:(\S+)/;

                push @{ $hash{distribution} }, \%tempHash;
            }

            # Restore the old record separator
            $/ = $oldRecSeparator;
        }
    }

    # Work with the filled-in %hash by sending a reference to it to a 
+subroutine
    # This is a complete record
    writeToSpreadSheet( \%hash );
    
    print Dumper \%hash;

    # Done 'reading' the string
    close $sh;
}


# Printing in a subroutine's not a good idea, but done here only to sh
+ow how to access the hash
sub writeToSpreadSheet {
    my ($hashReference) = @_;

    # The $$ notation dereferences the hash reference
    print $$hashReference{vendorID}, "\n";

    # The @{} notation deferences the array reference; the arrow opera
+tor deferences to get hash value
    for my $distribution ( @{ $$hashReference{distribution} } ) {
        print $distribution->{holdingCode}, "\n";
    }

    print "\n";
}

__DATA__
                             List of Distributions                    
+          
                                                                      
+          
                  Produced Tuesday, 9 October, 2012 at 1:38 PM
                                                                      
+          


       Order ID:PO-9999                  fiscal cycle:21112
      Vendor ID:VEND99                     order type:SUBSCRIPT
    15)   requisition number:                      copies:9    
                call number:XX(9999999.999)                          
                  ISBN/ISSN:9999-999X           
         Title:Item title here.
         ISSN:9999-999X
         Publication info:More text here about stuff

        Distribution--
            packing list:STUFF-I-DONT-NEED-999      
            holding code:CODEINFO1                   copies:1    
           date received:27/6/2012                             date lo
+aded:27/6/2012
              
        Distribution--
            packing list:STUFF-I-DONT-NEED-999
            holding code:CODEINFO3                    copies:2    
           date received:27/9/2012                             date lo
+aded:27/6/2012
              
        Distribution--
            packing list:STUFF-I-DONT-NEED-999
            holding code:CODEINFO2                     copies:1    
           date received:25/8/2012                             date lo
+aded:27/6/2012

                              List of Distributions                   
+           
                                                                      
+          
                  Produced Tuesday, 9 October, 2012 at 1:38 PM
                                                                      
+          


       Order ID:PO-1111                  fiscal cycle:21112
      Vendor ID:VEND11                     order type:SUBSCRIPT
    15)   requisition number:                      copies:417    
                call number:XX(11111111.111)                          
                  ISBN/ISSN:1111-111X           
         Title:Item title here.
         ISSN:9999-999X
         Publication info:More text here about stuff

        Distribution--
            packing list:STUFF-I-DONT-NEED-111      
            holding code:CODEINFO9                   copies:5    
           date received:11/6/2012                             date lo
+aded:12/6/2012
              
        Distribution--
            packing list:STUFF-I-DONT-NEED-111
            holding code:CODEINFO8                    copies:4    
           date received:11/9/2012                             date lo
+aded:12/6/2012
              
        Distribution--
            packing list:STUFF-I-DONT-NEED-111
            holding code:CODEINFO7                     copies:3    
           date received:11/8/2012                             date lo
+aded:12/6/2012
           
        Distribution--
            packing list:STUFF-I-DONT-NEED-111
            holding code:CODEINFO6                     copies:2    
           date received:11/8/2012                             date lo
+aded:12/6/2012
[download]

Output

VEND99
CODEINFO1
CODEINFO3
CODEINFO2

$VAR1 = {
          'vendorID' => 'VEND99',
          'copies' => '9',
          'fiscalCycle' => '21112',
          'distribution' => [
                              {
                                'dateLoaded' => '27/6/2012',
                                'dateReceived' => '27/6/2012',
                                'copies' => '1',
                                'holdingCode' => 'CODEINFO1'
                              },
                              {
                                'dateLoaded' => '27/6/2012',
                                'dateReceived' => '27/9/2012',
                                'copies' => '2',
                                'holdingCode' => 'CODEINFO3'
                              },
                              {
                                'dateLoaded' => '27/6/2012',
                                'dateReceived' => '25/8/2012',
                                'copies' => '1',
                                'holdingCode' => 'CODEINFO2'
                              }
                            ],
          'ISBN/ISSN' => '9999-999X',
          'title' => 'Item title here.',
          'orderID' => 'PO-9999',
          'requisitionNum' => '15'
        };
VEND11
CODEINFO9
CODEINFO8
CODEINFO7
CODEINFO6

$VAR1 = {
          'vendorID' => 'VEND11',
          'copies' => '417',
          'fiscalCycle' => '21112',
          'distribution' => [
                              {
                                'dateLoaded' => '12/6/2012',
                                'dateReceived' => '11/6/2012',
                                'copies' => '5',
                                'holdingCode' => 'CODEINFO9'
                              },
                              {
                                'dateLoaded' => '12/6/2012',
                                'dateReceived' => '11/9/2012',
                                'copies' => '4',
                                'holdingCode' => 'CODEINFO8'
                              },
                              {
                                'dateLoaded' => '12/6/2012',
                                'dateReceived' => '11/8/2012',
                                'copies' => '3',
                                'holdingCode' => 'CODEINFO7'
                              },
                              {
                                'dateLoaded' => '12/6/2012',
                                'dateReceived' => '11/8/2012',
                                'copies' => '2',
                                'holdingCode' => 'CODEINFO6'
                              }
                            ],
          'ISBN/ISSN' => '1111-111X',
          'title' => 'Item title here.',
          'requisitionNum' => '15',
          'orderID' => 'PO-1111'
        };
[download]

Included a subroutine and a call to it that shows how to handle accessing the hash a record at a time.

The code is commented, to assist with understanding it.

Let me know if you have any questions about this...

Enjoy!

In reply to Re^7: How best to strip text from a file? by Kenosis
in thread How best to strip text from a file? by bobdabuilda

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.