Hash Help

by reading the lines five at a time and pulling the user number out of the first one:

use warnings;
use strict;
use Data::Dump::Streamer;

my %records;

while (! eof (DATA)) {
    my $line1 = <DATA>;

    next unless defined $line1 and $line1 =~ /^\s1/;

    $_ = <DATA> for my ($line2, $line3, $line4, $line5);
    my $id = substr $line1, 9, 5;

    $records{$id} = [$line1, $line2, $line3, $line4, $line5];
}

Dump (\%records);

__DATA__
 10101ABC000019101L0001374686047S30339  GA   &DOE    C080229CR7  7    
+00244   0000001 000000000
 2          CR7            000            060714Q                     
+        Y   0000000000 000
 3                              00030339                           3JO
+HN DOE               36
 423 MAIN STREET  ATLANTA                GA 30339
 5                                                          +000000000
+0080226I   052461  05241961

 10101ABC000029102 N                 D                                
+3658 MAIN STREET
 2                                ATLANTA                       GA3033
+9                           0001
 3JOHN DOE                                        05241961INDV37468604
+7S
 4
 5
[download]

Prints:

$HASH1 = {
           "00001" => [
                        " 10101ABC000019101L0001374686047S30339  GA   
+&DOE    C080229CR7  7    00".
    "244   0000001 000000000\n",
                        " 2          CR7            000            060
+714Q                       ".
    "      Y   0000000000 000\n",
                        " 3                              00030339     
+                      3JOHN".
    " DOE               36\n",
                        " 423 MAIN STREET  ATLANTA                GA 3
+0339\n",
                        " 5                                           
+               +00000000000".
    "80226I   052461  05241961\n"
                      ],
           "00002" => [
                        " 10101ABC000029102 N                 D       
+                         36".
    "58 MAIN STREET\n",
                        " 2                                ATLANTA    
+                   GA30339 ".
    "                          0001\n",
                        " 3JOHN DOE                                   
+     05241961INDV374686047S".
    "\n",
                        " 4\n",
                        " 5\n"
                      ]
         };
[download]

Note that I edited the second record to give it a unique id as described and that I restored what appeared to be a missing space in front of the first line of the first record.

Perl reduces RSI - it saves typing

[reply]
[d/l]
[select]

my %records;

while (! eof (DATA)) {
    my $line1 = <DATA>;
    next unless defined $line1 and $line1 =~ /^01KV/;

    $_ = <DATA> for my ($line2, $line3, $line4, $line5, $line6, $line7
+);
    my $id = substr $line1, 2, 9;

   $records{$id} = [$line1, $line2, $line3, $line4, $line5, $line6, $l
+ine7];
[download]

[reply]
[d/l]

In that case you need to be smarter about recognizing records. There seems to be a line number associated with each line of a record so you can notice when the line number resets:

use warnings;
use strict;
use Data::Dump::Streamer;

my %records;
my @lines;

while (! eof (DATA) or @lines) {
    my $line = <DATA>;
    
    $line ||= ''; # Avoid a bunch of defined tests
    chomp $line;

    next unless $line =~ /^([\s\d]\d)/ or @lines;

    if (! defined $1 or $1 <= @lines) {
        # Start of new record (or last record) - save previous
        my $id = substr $lines[0], 9, 5;
        
        $records{$id} = [@lines];
        @lines = ();
    }

    push @lines, $line if length $line;
}

Dump (\%records);

__DATA__
 10101ABC000019101L0001374686047S30339  GA   &DOE    C080229CR7  7    
+00244   0000001 000000000
 2          CR7            000            060714Q                     
+        Y   0000000000 000
 3                              00030339                           3JO
+HN DOE               36
 423 MAIN STREET  ATLANTA                GA 30339
 5                                                          +000000000
+0080226I   052461  05241961
 6Additional line 1
 7and another additional line

 10101ABC000029102 N                 D                                
+3658 MAIN STREET
 2                                ATLANTA                       GA3033
+9                           0001
 3JOHN DOE                                        05241961INDV37468604
+7S
 4
 5
[download]

Perl reduces RSI - it saves typing

[reply]
[d/l]

If you are sure that every 5 lines is a record, you might just read 5 lines and get the user id from the first line. As a safety mechanism you could test wether the uid line has a string of at least 15 non-space characters at the beginning.

Better would be to check for the uid line, but is there anything in the uid line that can't occur on any other line? Maybe that the user id line starts with 5 numeric digits, then 3 alpha digits. This could be tested with if ($line=~/^\d{5}[a-zA-Z]{3}\w{6}/) { ....

If you have something to identify the line, the following (untested) code might work

my %allids=();
my $quo='';
while (my $line=<INP>) {
  if ($line=~ ... ) {
    $allids{substr($quo->[0],8,6)}= $quo if $quo;
    $quo=[$line];
  }
  else {
    push @$quo, $line;
  }
}
[download]

After that you should be able to access the array of lines for id 000123 with @{$allids{'000123'}}

UPDATE: PS: You say that substr 9-14 is the id, but in your example both records have the same numbers in the columns 9 to 14 (no matter if you counted from 0 or 1).

[reply]
[d/l]
[select]


Do you know where your variables are?
	PerlMonks