Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,
I have a file with lines like the following:
AsmA_2 PF13502 220 A0KIJ1 AsmA_2 PF13502 220 A0KIJ1 DUF490 PF04357 379 A1AJD0 Pfam-B_5656 PB005656 259 A1AJD0 STN PF07660 52 T2MWK7 Secretin_N_2 PF07655 98 T2MZK3 Pfam-B_2175 PB002175 114 T2MVV9 Secretin_N PF03958 82 U2ZRR1 Pfam-B_1479 PB001479 704 U2ZRR1 Pfam-B_2175 PB002175 114 U2ZRR1

The last code (e.g.A0KIJ1) should be my key. I want to store all info in one structure (as you can see there can be more than 1 lines referring to the same code. Which one must it be? Has of arrays? If so, how do I check, when reading the file line-by-line, that the code is the same or has changed?

Replies are listed 'Best First'.
Re: What data structure must I use for this problem?
by BrowserUk (Patriarch) on Jun 24, 2014 at 15:26 UTC

    A hash of arrays of arrays seems to fit the bill:

    #! perl -slw use strict; use Data::Dump qw[ pp ]; my %data; while( <DATA> ) { my @bits = split; push @{ $data{ pop @bits } }, \@bits; } pp \%data; __DATA__ AsmA_2 PF13502 220 A0KIJ1 AsmA_2 PF13502 220 A0KIJ1 DUF490 PF04357 379 A1AJD0 Pfam-B_5656 PB005656 259 A1AJD0 STN PF07660 52 T2MWK7 Secretin_N_2 PF07655 98 T2MZK3 Pfam-B_2175 PB002175 114 T2MVV9 Secretin_N PF03958 82 U2ZRR1 Pfam-B_1479 PB001479 704 U2ZRR1 Pfam-B_2175 PB002175 114 U2ZRR1

    Gives:

    [16:26:56.39] C:\test>junk56 { A0KIJ1 => [ ["AsmA_2", "PF13502", 220], ["AsmA_2", "PF13502", 220] ], A1AJD0 => [ ["DUF490", "PF04357", 379], ["Pfam-B_5656", "PB005656", 259] ], T2MVV9 => [ ["Pfam-B_2175", "PB002175", 114] ], T2MWK7 => [ ["STN", "PF07660", 52] ], T2MZK3 => [ ["Secretin_N_2", "PF07655", 98] ], U2ZRR1 => [ ["Secretin_N", "PF03958", 82], ["Pfam-B_1479", "PB001479", 704], ["Pfam-B_2175", "PB002175", 114], ], }

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
    #! perl -slw use strict; use Data::Dump qw,
Re: What data structure must I use for this problem?
by jeffa (Bishop) on Jun 24, 2014 at 15:20 UTC

    How about a Hash of Arrays (HoA) like so? (Assume your file is called data.txt)

    perl -MData::Dumper -lane'push @{ $hash{$F[3]} }, [@F]}{print Dumper \ +%hash' < data.txt

    Output:

    $VAR1 = { 'U2ZRR1' => [ [ 'Secretin_N', 'PF03958', '82', 'U2ZRR1' ], [ 'Pfam-B_1479', 'PB001479', '704', 'U2ZRR1' ], [ 'Pfam-B_2175', 'PB002175', '114', 'U2ZRR1' ] ], 'A1AJD0' => [ [ 'DUF490', 'PF04357', '379', 'A1AJD0' ], [ 'Pfam-B_5656', 'PB005656', '259', 'A1AJD0' ] ], 'T2MWK7' => [ [ 'STN', 'PF07660', '52', 'T2MWK7' ] ], 'T2MVV9' => [ [ 'Pfam-B_2175', 'PB002175', '114', 'T2MVV9' ] ], 'T2MZK3' => [ [ 'Secretin_N_2', 'PF07655', '98', 'T2MZK3' ] ], 'A0KIJ1' => [ [ 'AsmA_2', 'PF13502', '220', 'A0KIJ1' ], [ 'AsmA_2', 'PF13502', '220', 'A0KIJ1' ] ] };

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    
Re: What data structure must I use for this problem?
by LanX (Saint) on Jun 24, 2014 at 15:05 UTC
    Unclear, please give us examples of what you mean with "code" and show us lines where "code has changed" or not, to help us understand what you want.

    Otherwise we would waste time speculating. :)

    Cheers Rolf

    (addicted to the Perl Programming Language)

Re: What data structure must I use for this problem?
by Laurent_R (Canon) on Jun 24, 2014 at 21:15 UTC
    Hmm, not so sure about a database. When duplicate entries are allowed, database very often perform relatively poorly. And, more broadly, a hash is vastly faster than a database, so that a database gains an advantage only when the data is so big that it does not fit into a hash.

    In brief, an hash of arrays really seems to be what you need.

      Ah, that's great!
      Thank you Monks!
Re: What data structure must I use for this problem?
by Anonymous Monk on Jun 24, 2014 at 17:48 UTC
    Naturally, a database table comes to mind ...