Make unique hash of arrays?

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Make unique hash of arrays? by LanX (Saint) on Oct 23, 2022 at 21:58 UTC
`use v5.12; use warnings; my %seen; local $/="//\n"; while ( my $chunk = <DATA> ) { my @lines = split/\n/, $chunk; print $chunk unless $seen{ $lines[1] }{ $lines[2] }++ } __DATA__ nick AAAAAAAAAA BBBBBBBBBB // george EGRGERHTETEHTHR VFRTTHTRRHTE // andreas AAAAAAAAAA BBBBBBBBBB // thomas EWTRYTUYJTHT CEWWEQRWT$G // peter EGRGERHTETEHTHR VFRTTHTRRHTE //` [download] --> `nick AAAAAAAAAA BBBBBBBBBB // george EGRGERHTETEHTHR VFRTTHTRRHTE // thomas EWTRYTUYJTHT CEWWEQRWT$G //` [download] Cheers Rolf _{(addicted to the 𐍀𐌴𐍂𐌻 Programming Language :) Wikisyntax for the Monastery}	[reply] [d/l] [select]
Re^2: Make unique hash of arrays? by Anonymous Monk on Oct 23, 2022 at 22:02 UTC
Thank you!! What kind of structure is this if I may ask?	[reply]
Re^3: Make unique hash of arrays? by LanX (Saint) on Oct 23, 2022 at 22:20 UTC
> What kind of structure is this if I may ask? `%seen` is a Hash of Hashes, see `perldsc#HASHES-OF-HASHES` Cheers Rolf _{(addicted to the 𐍀𐌴𐍂𐌻 Programming Language :) Wikisyntax for the Monastery} update dump of `\%seen` (which could also be named `%count` :) `{ AAAAAAAAAA => { BBBBBBBBBB => 2 }, EGRGERHTETEHTHR => { VFRTTHTRRHTE => 2 }, EWTRYTUYJTHT => { "CEWWEQRWT\$G" => 1 }, }` [download]	[reply] [d/l] [select]
Re^4: Make unique hash of arrays? by Anonymous Monk on Oct 23, 2022 at 22:22 UTC
Re: Make unique hash of arrays? by kcott (Archbishop) on Oct 24, 2022 at 00:14 UTC
"Can you please advise what kind of structure I need to use?" You can use simple hash (`%uniq` in code below); take advantage of the fact that hash keys are unique; and form those keys by joining `string1` & `string2` with a character that won't appear in either string (I used a tab; change that to something else if your strings can contain tabs). `#!/usr/bin/env perl use strict; use warnings; use constant { REC_SEP => "\n//\n", KEY_SEP => "\t", OUT_FMT => "%s\n%s\n%s\n//\n", }; my %uniq; { local $/ = REC_SEP; while (<DATA>) { chomp; my ($id, $s1, $s2) = split; $uniq{join KEY_SEP, $s1, $s2} = $id; } } printf OUT_FMT, $uniq{$_}, split(KEY_SEP) for keys %uniq; __DATA__ nick AAAAAAAAAA BBBBBBBBBB // george EGRGERHTETEHTHR VFRTTHTRRHTE // andreas AAAAAAAAAA BBBBBBBBBB // thomas EWTRYTUYJTHT CEWWEQRWT$G // peter EGRGERHTETEHTHR VFRTTHTRRHTE //` [download] Output from sample run: `andreas AAAAAAAAAA BBBBBBBBBB // peter EGRGERHTETEHTHR VFRTTHTRRHTE // thomas EWTRYTUYJTHT CEWWEQRWT$G //` [download] Note that hashes are unordered. Although the output shown may appear to be ordered, on a couple of other runs that I tested I got thomas/andreas/peter & thomas/peter/andreas (and with sufficient test runs, I would have got all possible orders). If the output order is important to you, just add some sorting; e.g. `... for sort { $uniq{$a} cmp $uniq{$b} } keys %uniq;` [download] — Ken	[reply] [d/l] [select]
Re: Make unique hash of arrays? by tybalt89 (Monsignor) on Oct 23, 2022 at 22:32 UTC
Since I've been looking through List::AllUtils lately... `#!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11147618 use warnings; use List::AllUtils qw( uniq_by ); my @want = uniq_by { s/.*\n//r } do { local $/ = "//\n"; <DATA> }; print @want; __DATA__ nick AAAAAAAAAA BBBBBBBBBB // george EGRGERHTETEHTHR VFRTTHTRRHTE // andreas AAAAAAAAAA BBBBBBBBBB // thomas EWTRYTUYJTHT CEWWEQRWT$G // peter EGRGERHTETEHTHR VFRTTHTRRHTE //` [download] Outputs: `nick AAAAAAAAAA BBBBBBBBBB // george EGRGERHTETEHTHR VFRTTHTRRHTE // thomas EWTRYTUYJTHT CEWWEQRWT$G //` [download]	[reply] [d/l] [select]
Re: Make unique hash of arrays? by tybalt89 (Monsignor) on Oct 23, 2022 at 22:35 UTC
The only data structure needed is a multi-line string :) `#!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11147618 use warnings; local $_ = join '', <DATA>; 1 while s[^(?:.\n)((?:.\n){2}//\n)((?:.\n))\K.*\n\1][]m; print; __DATA__ nick AAAAAAAAAA BBBBBBBBBB // george EGRGERHTETEHTHR VFRTTHTRRHTE // andreas AAAAAAAAAA BBBBBBBBBB // thomas EWTRYTUYJTHT CEWWEQRWT$G // peter EGRGERHTETEHTHR VFRTTHTRRHTE //` [download] Outputs: `nick AAAAAAAAAA BBBBBBBBBB // george EGRGERHTETEHTHR VFRTTHTRRHTE // thomas EWTRYTUYJTHT CEWWEQRWT$G //` [download]	[reply] [d/l] [select]
Re: Make unique hash of arrays? by tybalt89 (Monsignor) on Oct 24, 2022 at 01:36 UTC
`#!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11147618 use warnings; my %uniq = map { s/.*\n//r, $_ } do { local $/ = "//\n"; <DATA> }; print values %uniq; __DATA__ nick AAAAAAAAAA BBBBBBBBBB // george EGRGERHTETEHTHR VFRTTHTRRHTE // andreas AAAAAAAAAA BBBBBBBBBB // thomas EWTRYTUYJTHT CEWWEQRWT$G // peter EGRGERHTETEHTHR VFRTTHTRRHTE //` [download]	[reply] [d/l]
Re^2: Make unique hash of arrays? by Anonymous Monk on Oct 24, 2022 at 16:55 UTC
Notes: This needs `use v5.14.0` (or equivalent) for `s///r`. The `local $/ = "//\n"` changes the diamond operator's notion of what a line is, causing it to read the file in chunks delimited by `"//\n"`.	[reply] [d/l] [select]

update