Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,
I have a series of triplets that are like this:
ID string1 string2

This structure is regarded as 'one'. Now, my problem is that in many occasions, string1 and string2 are repeated, but they belong to the same ID, for instance:
nick AAAAAAAAAA BBBBBBBBBB // george EGRGERHTETEHTHR VFRTTHTRRHTE // andreas AAAAAAAAAA BBBBBBBBBB // thomas EWTRYTUYJTHT CEWWEQRWT$G // peter EGRGERHTETEHTHR VFRTTHTRRHTE //

I would like to make this structure unique (does not matter which ID I keep). Can you please advise what kind of structure I need to use? Is it HoA or something else?

Replies are listed 'Best First'.
Re: Make unique hash of arrays?
by LanX (Saint) on Oct 23, 2022 at 21:58 UTC
    use v5.12; use warnings; my %seen; local $/="//\n"; while ( my $chunk = <DATA> ) { my @lines = split/\n/, $chunk; print $chunk unless $seen{ $lines[1] }{ $lines[2] }++ } __DATA__ nick AAAAAAAAAA BBBBBBBBBB // george EGRGERHTETEHTHR VFRTTHTRRHTE // andreas AAAAAAAAAA BBBBBBBBBB // thomas EWTRYTUYJTHT CEWWEQRWT$G // peter EGRGERHTETEHTHR VFRTTHTRRHTE //

    -->

    nick AAAAAAAAAA BBBBBBBBBB // george EGRGERHTETEHTHR VFRTTHTRRHTE // thomas EWTRYTUYJTHT CEWWEQRWT$G //

    Cheers Rolf
    (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
    Wikisyntax for the Monastery

      Thank you!! What kind of structure is this if I may ask?
        > What kind of structure is this if I may ask?

        %seen is a Hash of Hashes, see perldsc#HASHES-OF-HASHES

        Cheers Rolf
        (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
        Wikisyntax for the Monastery

        update

        dump of \%seen (which could also be named %count :)

        { AAAAAAAAAA => { BBBBBBBBBB => 2 }, EGRGERHTETEHTHR => { VFRTTHTRRHTE => 2 }, EWTRYTUYJTHT => { "CEWWEQRWT\$G" => 1 }, }
Re: Make unique hash of arrays?
by kcott (Archbishop) on Oct 24, 2022 at 00:14 UTC
    "Can you please advise what kind of structure I need to use?"

    You can use simple hash (%uniq in code below); take advantage of the fact that hash keys are unique; and form those keys by joining string1 & string2 with a character that won't appear in either string (I used a tab; change that to something else if your strings can contain tabs).

    #!/usr/bin/env perl use strict; use warnings; use constant { REC_SEP => "\n//\n", KEY_SEP => "\t", OUT_FMT => "%s\n%s\n%s\n//\n", }; my %uniq; { local $/ = REC_SEP; while (<DATA>) { chomp; my ($id, $s1, $s2) = split; $uniq{join KEY_SEP, $s1, $s2} = $id; } } printf OUT_FMT, $uniq{$_}, split(KEY_SEP) for keys %uniq; __DATA__ nick AAAAAAAAAA BBBBBBBBBB // george EGRGERHTETEHTHR VFRTTHTRRHTE // andreas AAAAAAAAAA BBBBBBBBBB // thomas EWTRYTUYJTHT CEWWEQRWT$G // peter EGRGERHTETEHTHR VFRTTHTRRHTE //

    Output from sample run:

    andreas AAAAAAAAAA BBBBBBBBBB // peter EGRGERHTETEHTHR VFRTTHTRRHTE // thomas EWTRYTUYJTHT CEWWEQRWT$G //

    Note that hashes are unordered. Although the output shown may appear to be ordered, on a couple of other runs that I tested I got thomas/andreas/peter & thomas/peter/andreas (and with sufficient test runs, I would have got all possible orders). If the output order is important to you, just add some sorting; e.g.

    ... for sort { $uniq{$a} cmp $uniq{$b} } keys %uniq;

    — Ken

Re: Make unique hash of arrays?
by tybalt89 (Monsignor) on Oct 23, 2022 at 22:32 UTC

    Since I've been looking through List::AllUtils lately...

    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11147618 use warnings; use List::AllUtils qw( uniq_by ); my @want = uniq_by { s/.*\n//r } do { local $/ = "//\n"; <DATA> }; print @want; __DATA__ nick AAAAAAAAAA BBBBBBBBBB // george EGRGERHTETEHTHR VFRTTHTRRHTE // andreas AAAAAAAAAA BBBBBBBBBB // thomas EWTRYTUYJTHT CEWWEQRWT$G // peter EGRGERHTETEHTHR VFRTTHTRRHTE //

    Outputs:

    nick AAAAAAAAAA BBBBBBBBBB // george EGRGERHTETEHTHR VFRTTHTRRHTE // thomas EWTRYTUYJTHT CEWWEQRWT$G //
Re: Make unique hash of arrays?
by tybalt89 (Monsignor) on Oct 23, 2022 at 22:35 UTC

    The only data structure needed is a multi-line string :)

    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11147618 use warnings; local $_ = join '', <DATA>; 1 while s[^(?:.*\n)((?:.*\n){2}//\n)((?:.*\n)*)\K.*\n\1][]m; print; __DATA__ nick AAAAAAAAAA BBBBBBBBBB // george EGRGERHTETEHTHR VFRTTHTRRHTE // andreas AAAAAAAAAA BBBBBBBBBB // thomas EWTRYTUYJTHT CEWWEQRWT$G // peter EGRGERHTETEHTHR VFRTTHTRRHTE //

    Outputs:

    nick AAAAAAAAAA BBBBBBBBBB // george EGRGERHTETEHTHR VFRTTHTRRHTE // thomas EWTRYTUYJTHT CEWWEQRWT$G //
Re: Make unique hash of arrays?
by tybalt89 (Monsignor) on Oct 24, 2022 at 01:36 UTC
    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11147618 use warnings; my %uniq = map { s/.*\n//r, $_ } do { local $/ = "//\n"; <DATA> }; print values %uniq; __DATA__ nick AAAAAAAAAA BBBBBBBBBB // george EGRGERHTETEHTHR VFRTTHTRRHTE // andreas AAAAAAAAAA BBBBBBBBBB // thomas EWTRYTUYJTHT CEWWEQRWT$G // peter EGRGERHTETEHTHR VFRTTHTRRHTE //

      Notes:

      • This needs use v5.14.0 (or equivalent) for s///r.
      • The local $/ = "//\n" changes the diamond operator's notion of what a line is, causing it to read the file in chunks delimited by "//\n".