Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Hash Help

by mmittiga17 (Scribe)
on Nov 05, 2008 at 15:44 UTC ( [id://721686]=perlquestion: print w/replies, xml ) Need Help??

mmittiga17 has asked for the wisdom of the Perl Monks concerning the following question:

Each record is 5 lines and are numbered 1-5. Each record is uniquely +identified by usernumber on line1 substr 9-14. I need to parse bits of each line "fix width" for each usernumber and +out put the info to 1 csv line. In absence of the usernumber being p +resent on each record line, how can I create a hash to uniquely ident +ify each line "1-5" with to the usernumber on line 1? This would be +easy if the usernumber appeared on each line but it dose not. 10101ABC000019101L0001374686047S30339 GA &DOE C080229CR7 7 0 +0244 0000001 000000000 2 CR7 000 060714Q + Y 0000000000 000 3 00030339 3JO +HN DOE 36 423 MAIN STREET ATLANTA GA 30339 + 5 +000000000 +0080226I 052461 05241961 10101ABC000019102 N D +3658 MAIN STREET 2 ATLANTA GA3033 +9 0001 3JOHN DOE 05241961INDV37468604 +7S 4 + 5
Thank you in advance for any help or suggestions.

Replies are listed 'Best First'.
Re: Hash Help
by GrandFather (Saint) on Nov 05, 2008 at 20:16 UTC

    by reading the lines five at a time and pulling the user number out of the first one:

    use warnings; use strict; use Data::Dump::Streamer; my %records; while (! eof (DATA)) { my $line1 = <DATA>; next unless defined $line1 and $line1 =~ /^\s1/; $_ = <DATA> for my ($line2, $line3, $line4, $line5); my $id = substr $line1, 9, 5; $records{$id} = [$line1, $line2, $line3, $line4, $line5]; } Dump (\%records); __DATA__ 10101ABC000019101L0001374686047S30339 GA &DOE C080229CR7 7 +00244 0000001 000000000 2 CR7 000 060714Q + Y 0000000000 000 3 00030339 3JO +HN DOE 36 423 MAIN STREET ATLANTA GA 30339 5 +000000000 +0080226I 052461 05241961 10101ABC000029102 N D +3658 MAIN STREET 2 ATLANTA GA3033 +9 0001 3JOHN DOE 05241961INDV37468604 +7S 4 5

    Prints:

    $HASH1 = { "00001" => [ " 10101ABC000019101L0001374686047S30339 GA +&DOE C080229CR7 7 00". "244 0000001 000000000\n", " 2 CR7 000 060 +714Q ". " Y 0000000000 000\n", " 3 00030339 + 3JOHN". " DOE 36\n", " 423 MAIN STREET ATLANTA GA 3 +0339\n", " 5 + +00000000000". "80226I 052461 05241961\n" ], "00002" => [ " 10101ABC000029102 N D + 36". "58 MAIN STREET\n", " 2 ATLANTA + GA30339 ". " 0001\n", " 3JOHN DOE + 05241961INDV374686047S". "\n", " 4\n", " 5\n" ] };

    Note that I edited the second record to give it a unique id as described and that I restored what appeared to be a missing space in front of the first line of the first record.


    Perl reduces RSI - it saves typing
      Thanks again for your help, I have a question if you have time, I notice in some cases in the file I am trying to parse that there maybe 7 lines of info per ID and in other cases only 5.
      my %records; while (! eof (DATA)) { my $line1 = <DATA>; next unless defined $line1 and $line1 =~ /^01KV/; $_ = <DATA> for my ($line2, $line3, $line4, $line5, $line6, $line7 +); my $id = substr $line1, 2, 9; $records{$id} = [$line1, $line2, $line3, $line4, $line5, $line6, $l +ine7];
      If I add line6 and line7, I notice that IDs with only five lines will also grab the first two lines of the next record. How can I dynamically account for IDs with more than 5 record lines? Thanks!

        In that case you need to be smarter about recognizing records. There seems to be a line number associated with each line of a record so you can notice when the line number resets:

        use warnings; use strict; use Data::Dump::Streamer; my %records; my @lines; while (! eof (DATA) or @lines) { my $line = <DATA>; $line ||= ''; # Avoid a bunch of defined tests chomp $line; next unless $line =~ /^([\s\d]\d)/ or @lines; if (! defined $1 or $1 <= @lines) { # Start of new record (or last record) - save previous my $id = substr $lines[0], 9, 5; $records{$id} = [@lines]; @lines = (); } push @lines, $line if length $line; } Dump (\%records); __DATA__ 10101ABC000019101L0001374686047S30339 GA &DOE C080229CR7 7 +00244 0000001 000000000 2 CR7 000 060714Q + Y 0000000000 000 3 00030339 3JO +HN DOE 36 423 MAIN STREET ATLANTA GA 30339 5 +000000000 +0080226I 052461 05241961 6Additional line 1 7and another additional line 10101ABC000029102 N D +3658 MAIN STREET 2 ATLANTA GA3033 +9 0001 3JOHN DOE 05241961INDV37468604 +7S 4 5

        Perl reduces RSI - it saves typing
Re: Hash Help
by jethro (Monsignor) on Nov 05, 2008 at 18:03 UTC

    If you are sure that every 5 lines is a record, you might just read 5 lines and get the user id from the first line. As a safety mechanism you could test wether the uid line has a string of at least 15 non-space characters at the beginning.

    Better would be to check for the uid line, but is there anything in the uid line that can't occur on any other line? Maybe that the user id line starts with 5 numeric digits, then 3 alpha digits. This could be tested with  if ($line=~/^\d{5}[a-zA-Z]{3}\w{6}/) { ....

    If you have something to identify the line, the following (untested) code might work

    my %allids=(); my $quo=''; while (my $line=<INP>) { if ($line=~ ... ) { $allids{substr($quo->[0],8,6)}= $quo if $quo; $quo=[$line]; } else { push @$quo, $line; } }

    After that you should be able to access the array of lines for id 000123 with @{$allids{'000123'}}

    UPDATE: PS: You say that substr 9-14 is the id, but in your example both records have the same numbers in the columns 9 to 14 (no matter if you counted from 0 or 1).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://721686]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (4)
As of 2024-04-18 08:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found