Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: Writing hashes as records to a CSV file

by LanX (Saint)
on Dec 08, 2021 at 23:27 UTC ( [id://11139483]=note: print w/replies, xml ) Need Help??


in reply to Writing hashes as records to a CSV file

> but can one do it directly (using Text::CSV) ie. without figuring out the superset of keys myself

I searched thru the documentation of Text::CSV but couldn't find that. But I'm confident Tux as one of the maintainers will know.

> I could do this "manually" as it were,

Merging hashes is not complicated in Perl:

after

%H = (%H,%$_) for @AoH

you'll have the superset in

keys %H

More importantly you can keep control over the order of columns created. (like ordering by count)

Do you really want to leave that to Text::CSV, which most probably will be random?

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery

update x2
use strict; use warnings; use Data::Dump qw/pp dd/; my @AoH = ( {AB => 100, NN => 200, XYZ => 400}, {AB => 100, XYZ => 400, MM => 300}, { map { ("A$_" => $_) } "A".."C" } ); my %H; %H = (%H,%$_) for @AoH; dd keys %H; my %count; map { map { $count{$_}++ } keys %$_ } @AoH; dd sort { $count{$b} <=> $count{$a} || $a cmp $b } keys %count;
OUTPUT:
("XYZ", "NN", "MM", "AC", "AA", "AB") ("AB", "XYZ", "AA", "AC", "MM", "NN")

NB: "AB" is last in the first result, even that it appears in each record.

The second result however will order the columns first by count and then by alphabetic order.

YMMV...

Replies are listed 'Best First'.
Re^2: Writing hashes as records to a CSV file (joining keys with slices)
by LanX (Saint) on Dec 09, 2021 at 13:45 UTC
    > after %H = (%H,%$_) for @AoH you'll have the superset in keys %H

    actually using slices and empty key lists is way faster

    @H{keys %$_} = () for @$AoH;

    DEMO:

    use v5.12; use warnings; use Test::More; use Benchmark qw/cmpthese/; my $AoH; for my $n_rec (1, 10,100,1000) { say; say "=== num of records is: ",$n_rec; $AoH = create_data(1,$n_rec); is_deeply( [sort &list_join], [sort &slice_join], ); cmpthese(-1, { 'list_join' => \&list_join, 'slice_join' => \&slice_join, } ); } done_testing; sub list_join { my %H; %H = (%H,%$_) for @$AoH; return keys %H; } sub slice_join { my %H; @H{keys %$_}=() for @$AoH; return keys %H; } sub create_data { my ( $density,$records ) = @_ ; my @AoH; push @AoH, { map { rand 100 <= $density ? ("$_" => $_) :() } "A".. +"ZZ" } for 1..$records; return \@AoH; }
    OUTPUT:
    __DATA__ === num of records is: 1 ok 1 Rate list_join slice_join list_join 238532/s -- -65% slice_join 682713/s 186% -- __DATA__ === num of records is: 10 ok 2 Rate list_join slice_join list_join 7819/s -- -93% slice_join 112993/s 1345% -- __DATA__ === num of records is: 100 ok 3 Rate list_join slice_join list_join 82.9/s -- -99% slice_join 8533/s 10195% -- __DATA__ === num of records is: 1000 ok 4 Rate list_join slice_join list_join 3.66/s -- -100% slice_join 1067/s 29072% -- 1..4

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

    Update

    fixed bug in sorted tests

      I compared it to the two methods I used in my examples, just out of curiousity and to learn. I never used the list_join as that intuitively feels stupid/slow, but the slice_join was also not in my default tool-box, as map feels so much more logical to me. Anyway, here we go ...

      I think the grep_map is a neat contender and reads easier than the slice_join. YMMV.


      Enjoy, Have FUN! H.Merijn
        And for the spirit of TIMTOWTDI:
        sub map_direct { my %H = (map %$_, @$AoH); return keys %H } sub map_keys { my %H; @H{map keys %$_, @$AoH} = (); return keys %H }

        map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
        > I never used the list_join as that intuitively feels stupid/slow,

        Well, it's the canonic way to join two hashes.

        > I think the grep_map is a neat contender and reads easier than the slice_join. YMMV.

        I'm very skeptical about solutions with long intermediate lists like with the map. They might cause memory problems.

        see what happens with a density of 75%, and now imagine handling much more data, where the machine starts swapping.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

        update

        fixed bugs in tests with sort, see also sort AoH buggy?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11139483]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (3)
As of 2024-04-25 14:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found