Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re^3: Writing hashes as records to a CSV file (joining keys with slices)

by Tux (Canon)
on Dec 09, 2021 at 14:06 UTC ( [id://11139502]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Writing hashes as records to a CSV file (joining keys with slices)
in thread Writing hashes as records to a CSV file

I compared it to the two methods I used in my examples, just out of curiousity and to learn. I never used the list_join as that intuitively feels stupid/slow, but the slice_join was also not in my default tool-box, as map feels so much more logical to me. Anyway, here we go ...

My expanded testcase
my $AoH; for my $n_rec (1, 10, 100, 1000) { say ""; say "=== num of records is: ", $n_rec; $AoH = create_data (1, $n_rec); is_deeply ([ sort $_->() ], [ sort &slice_join ]) for \&list_join, + \&grep_map, \&map_map; cmpthese ( -1, { map_map => \&map_map, grep_map => \&grep_map, list_join => \&list_join, slice_join => \&slice_join, }); } done_testing; sub list_join { my %H; %H = (%H, %$_) for @$AoH; return keys %H; } # list_join sub slice_join { my %H; @H{keys %$_} = () for @$AoH; return keys %H; } # slice_join sub grep_map { my %seen; return grep { !$seen{$_}++ } map { keys %$_ } @$AoH; } # grep_map sub map_map { my %H = map { $_ => 1 } map { keys %$_ } @$AoH; return keys %H; } # map_map sub create_data { my ($density, $records) = @_; my @AoH; push @AoH, {map { rand 100 <= $density ? ("$_" => $_) : () } "A" . +. "ZZ"} for 1 .. $records; return \@AoH; }
My results
=== num of records is: 1 ok 1 ok 2 ok 3 Rate map_map list_join grep_map slice_join map_map 508970/s -- -18% -19% -45% list_join 619376/s 22% -- -1% -33% grep_map 625570/s 23% 1% -- -33% slice_join 928647/s 82% 50% 48% -- === num of records is: 10 ok 4 ok 5 ok 6 Rate list_join map_map grep_map slice_join list_join 12444/s -- -80% -83% -88% map_map 61440/s 394% -- -17% -42% grep_map 73770/s 493% 20% -- -30% slice_join 105217/s 746% 71% 43% -- === num of records is: 100 ok 7 ok 8 ok 9 Rate list_join map_map grep_map slice_join list_join 174/s -- -97% -98% -98% map_map 5966/s 3332% -- -25% -43% grep_map 7952/s 4474% 33% -- -24% slice_join 10479/s 5928% 76% 32% -- === num of records is: 1000 ok 10 ok 11 ok 12 Rate list_join map_map grep_map slice_join list_join 6.67/s -- -99% -99% -99% map_map 545/s 8070% -- -47% -56% grep_map 1027/s 15299% 88% -- -17% slice_join 1244/s 18553% 128% 21% --

I think the grep_map is a neat contender and reads easier than the slice_join. YMMV.


Enjoy, Have FUN! H.Merijn

Replies are listed 'Best First'.
Re^4: Writing hashes as records to a CSV file (joining keys with slices)
by choroba (Cardinal) on Dec 09, 2021 at 21:41 UTC
    And for the spirit of TIMTOWTDI:
    sub map_direct { my %H = (map %$_, @$AoH); return keys %H } sub map_keys { my %H; @H{map keys %$_, @$AoH} = (); return keys %H }

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re^4: Writing hashes as records to a CSV file (joining keys with slices)
by LanX (Saint) on Dec 09, 2021 at 14:45 UTC
    > I never used the list_join as that intuitively feels stupid/slow,

    Well, it's the canonic way to join two hashes.

    > I think the grep_map is a neat contender and reads easier than the slice_join. YMMV.

    I'm very skeptical about solutions with long intermediate lists like with the map. They might cause memory problems.

    see what happens with a density of 75%, and now imagine handling much more data, where the machine starts swapping.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

    update

    fixed bugs in tests with sort, see also sort AoH buggy?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11139502]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (7)
As of 2024-04-19 10:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found