> I never used the list_join as that intuitively feels stupid/slow,

Well, it's the canonic way to join two hashes.

> I think the grep_map is a neat contender and reads easier than the slice_join. YMMV.

I'm very skeptical about solutions with long intermediate lists like with the map. They might cause memory problems.

see what happens with a density of 75%, and now imagine handling much more data, where the machine starts swapping.

use v5.12; use warnings; use Test::More; use Benchmark qw/cmpthese/; my $AoH; my $density = 75; for my $n_rec (1, 10, 100, 1000) { say ""; say "=== num of records is: ", $n_rec, " duplication: $density%"; $AoH = create_data ($density, $n_rec); is_deeply ( [ sort &list_join ], [ sort &slice_join ], ); is_deeply ( [ sort &grep_map ], [ sort &slice_join ], ); is_deeply ( [ sort &map_map ], [ sort &slice_join ], ); cmpthese ( -1, { map_map => \&map_map, grep_map => \&grep_map, list_join => \&list_join, slice_join => \&slice_join, }); } done_testing; sub list_join { my %H; %H = (%H, %$_) for @$AoH; return keys %H; } # list_join sub slice_join { my %H; @H{keys %$_} = () for @$AoH; return keys %H; } # slice_join sub grep_map { my %seen; return grep { !$seen{$_}++ } map { keys %$_ } @$AoH; } # grep_map sub map_map { my %H = map { $_ => 1 } map { keys %$_ } @$AoH; return keys %H; } # map_map sub create_data { my ($density, $records) = @_; my @AoH; push @AoH, {map { rand 100 <= $density ? ( $_ => $_ ) : () } "A" . +. "ZZ"} for 1 .. $records; return \@AoH; }
OUTPUT:
=== num of records is: 1 duplication: 75% ok 1 ok 2 ok 3 Rate list_join map_map grep_map slice_join list_join 4575/s -- -26% -40% -59% map_map 6224/s 36% -- -18% -44% grep_map 7625/s 67% 23% -- -31% slice_join 11079/s 142% 78% 45% -- === num of records is: 10 duplication: 75% ok 4 ok 5 ok 6 Rate list_join map_map grep_map slice_join list_join 202/s -- -71% -77% -87% map_map 696/s 244% -- -21% -57% grep_map 884/s 337% 27% -- -45% slice_join 1612/s 697% 131% 82% -- === num of records is: 100 duplication: 75% ok 7 ok 8 ok 9 Rate list_join map_map grep_map slice_join list_join 17.5/s -- -72% -84% -89% map_map 63.1/s 260% -- -43% -61% grep_map 110/s 526% 74% -- -32% slice_join 161/s 821% 156% 47% -- === num of records is: 1000 duplication: 75% ok 10 ok 11 ok 12 (warning: too few iterations for a reliable count) Rate list_join map_map grep_map slice_join list_join 1.73/s -- -73% -85% -89% map_map 6.40/s 270% -- -46% -60% grep_map 11.8/s 583% 85% -- -26% slice_join 16.0/s 825% 150% 35% -- 1..12

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery

update

fixed bugs in tests with sort, see also sort AoH buggy?


In reply to Re^4: Writing hashes as records to a CSV file (joining keys with slices) by LanX
in thread Writing hashes as records to a CSV file by Serene Hacker

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.