Writing hashes as records to a CSV file


Perl: the Markov chain saw
	PerlMonks

Writing hashes as records to a CSV file

by Serene Hacker (Initiate)

on Dec 08, 2021 at 22:55 UTC ( [id://11139481]=perlquestion: print w/replies, xml )

Need Help??

Serene Hacker has asked for the wisdom of the Perl Monks concerning the following question:

I have a bunch of records containing key,value pairs.

I'd like to create a CSV file containing one row for each record.

However, each record doesn't necessarily contain exactly the same keys as the other records. So I'd like the column-headers to be a superset of all the keys. E.g.

{AB1 => 100, NN => 200, XYZ => 400}
{AB1 => 100, XYZ => 400, MM => 300}
[download]

-->

"AB1","NN","XYZ","MM"
"100","200","400",""
"100","","400","300"
[download]

ie.

AB1   NN       XYZ   MM
100   200      400   undef
100   undef    400   300
[download]

I could do this "manually" as it were, but can one do it directly (using Text::CSV) ie. without figuring out the superset of keys myself? In Text::CSV I see one can *read* tables with headers as hashes. But can one write them just as easily?

Comment on Writing hashes as records to a CSV file Select or Download Code

Replies are listed 'Best First'.
Re: Writing hashes as records to a CSV file by choroba (Cardinal) on Dec 08, 2021 at 23:24 UTC
You can collect the column names by iterating over the records and accumulating their keys. `#!/usr/bin/perl use warnings; use strict; use feature qw{ say }; use List::Util qw{ uniq }; use Text::CSV_XS; my @records = ({AB1 => 100, NN => 200, XYZ => 400}, {AB1 => 100, XYZ => 400, MM => 300}); my @columns = sort +uniq(map keys %$_, @records); my $csv = 'Text::CSV_XS'->new({auto_diag => 1}); for my $record ({map { $_ => $_ } @columns}, @records) { $csv->say(\STDOUT, [@$record{@columns}]); }` [download] `map{substr$_->[0],$_->[1]\|\|0,1}[\\|\|{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]`	[reply] [d/l] [select]
Re: Writing hashes as records to a CSV file by Tux (Canon) on Dec 09, 2021 at 08:55 UTC
My variation using `csv`: `use Text::CSV_XS qw( csv ); my @records = ( { AB1 => 100, NN => 200, XYZ => 400}, { AB1 => 100, XYZ => 400, MM => 300}, ); my %fld = map { $_ => 1 } map { keys %$_ } @records; csv (in => \@records, headers => [ sort keys %fld ]);` [download] A suggestion might be to add `quote_empty => 1` to be able to distinguish between missing keys and defined keys. But as that does not shows the difference between missing keys and keys that point to an undefined value, I would suggest something like this (unless you don't care about those differences): `use Text::CSV_XS qw( csv ); my @records = ( { AB1 => 100, NN => 200, XYZ => 400 }, { AB1 => 200, XYZ => 400, MM => 300 }, { AB1 => 300, XYZ => undef, MM => "" }, ); my %seen; my @fld = sort grep { !$seen{$_}++ } map { keys %$_ } @records; csv (in => \@records, headers => \@fld, quote_empty => 1, on_in => sub + { foreach my $f (@fld) { exists $_{$f} or $_{$f} = "--missing--"; } });` [download] -> `AB1,MM,NN,XYZ 100,--missing--,200,400 200,300,--missing--,400 300,"",--missing--,` [download] To answer LanX' question: as CSV is streaming, you don't know what is to come, so there is no builtin functionality to find unique values in CSV, but there are enough builtin features to do it yourself. Enjoy, Have FUN! H.Merijn	[reply] [d/l] [select]
Re: Writing hashes as records to a CSV file by LanX (Saint) on Dec 08, 2021 at 23:27 UTC
> but can one do it directly (using Text::CSV) ie. without figuring out the superset of keys myself I searched thru the documentation of Text::CSV but couldn't find that. But I'm confident Tux as one of the maintainers will know. > I could do this "manually" as it were, Merging hashes is not complicated in Perl: after `%H = (%H,%$_) for @AoH` you'll have the superset in `keys %H` More importantly you can keep control over the order of columns created. (like ordering by count) Do you really want to leave that to Text::CSV, which most probably will be random? Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery} update x2 `use strict; use warnings; use Data::Dump qw/pp dd/; my @AoH = ( {AB => 100, NN => 200, XYZ => 400}, {AB => 100, XYZ => 400, MM => 300}, { map { ("A$_" => $_) } "A".."C" } ); my %H; %H = (%H,%$_) for @AoH; dd keys %H; my %count; map { map { $count{$_}++ } keys %$_ } @AoH; dd sort { $count{$b} <=> $count{$a} \|\| $a cmp $b } keys %count;` [download] OUTPUT: `("XYZ", "NN", "MM", "AC", "AA", "AB") ("AB", "XYZ", "AA", "AC", "MM", "NN")` [download] NB: "AB" is last in the first result, even that it appears in each record. The second result however will order the columns first by count and then by alphabetic order. YMMV...	[reply] [d/l] [select]
Re^2: Writing hashes as records to a CSV file (joining keys with slices) by LanX (Saint) on Dec 09, 2021 at 13:45 UTC
> after `%H = (%H,%$_) for @AoH` you'll have the superset in `keys %H` actually using slices and empty key lists is way faster `@H{keys %$_} = () for @$AoH;` DEMO: use v5.12; use warnings; use Test::More; use Benchmark qw/cmpthese/; my $AoH; for my $n_rec (1, 10,100,1000) { say; say "=== num of records is: ",$n_rec; $AoH = create_data(1,$n_rec); is_deeply( [sort &list_join], [sort &slice_join], ); cmpthese(-1, { 'list_join' => \&list_join, 'slice_join' => \&slice_join, } ); } done_testing; sub list_join { my %H; %H = (%H,%$_) for @$AoH; return keys %H; } sub slice_join { my %H; @H{keys %$_}=() for @$AoH; return keys %H; } sub create_data { my ( $density,$records ) = @_ ; my @AoH; push @AoH, { map { rand 100 <= $density ? ("$_" => $_) :() } "A".. +"ZZ" } for 1..$records; return \@AoH; } [download] OUTPUT: `__DATA__ === num of records is: 1 ok 1 Rate list_join slice_join list_join 238532/s -- -65% slice_join 682713/s 186% -- __DATA__ === num of records is: 10 ok 2 Rate list_join slice_join list_join 7819/s -- -93% slice_join 112993/s 1345% -- __DATA__ === num of records is: 100 ok 3 Rate list_join slice_join list_join 82.9/s -- -99% slice_join 8533/s 10195% -- __DATA__ === num of records is: 1000 ok 4 Rate list_join slice_join list_join 3.66/s -- -100% slice_join 1067/s 29072% -- 1..4` [download] Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery} Update fixed bug in sorted tests	[reply] [d/l] [select]
Re^3: Writing hashes as records to a CSV file (joining keys with slices) by Tux (Canon) on Dec 09, 2021 at 14:06 UTC
I compared it to the two methods I used in my examples, just out of curiousity and to learn. I never used the list_join as that intuitively feels stupid/slow, but the slice_join was also not in my default tool-box, as `map` feels so much more logical to me. Anyway, here we go ... Read more... (1494 Bytes) Read more... (2 kB) I think the grep_map is a neat contender and reads easier than the slice_join. YMMV. Enjoy, Have FUN! H.Merijn	[reply] [d/l] [select]
Re^4: Writing hashes as records to a CSV file (joining keys with slices) by choroba (Cardinal) on Dec 09, 2021 at 21:41 UTC
Re^5: Writing hashes as records to a CSV file (joining keys with slices) by Tux (Canon) on Dec 10, 2021 at 08:03 UTC
Re^4: Writing hashes as records to a CSV file (joining keys with slices) by LanX (Saint) on Dec 09, 2021 at 14:45 UTC
Re: Writing hashes as records to a CSV file by kcott (Archbishop) on Dec 09, 2021 at 06:48 UTC
G'day Serene Hacker, It would have helped if you'd shown a sample of your actual input file. The two hashrefs you show give no indication of the data source or how the data shown was extracted; in fact, I'm not at all convinced that those hashrefs are necessary, even as temporary data structures. Purely for demonstration purposes, let's assume your input file records are URLs; and the query string (which could be empty or even absent) represents the "containing key,value pairs". `$ cat pm_11139481_csv_kv_pairs_data.txt http://example.com/?AB1=101&NN=201&XYZ=401 http://example.net/?AB1=102&XYZ=402&MM=302 http://example.org/?MM=303&XYZ=403&NN=203&AB1=103 http://example.mil/? https://example.mil/ http://example.edu/?PQR=504` [download] Note that I used unique values throughout. The last digit of each matches the record number. This facilitates quickly checking the output by visual inspection. (Note: in your example, two instances of 100, and two of 400, do not allow this.) Now, process that data with: #!/usr/bin/env perl use strict; use warnings; use autodie; use Text::CSV; my $source_data = 'pm_11139481_csv_kv_pairs_data.txt'; my $created_csv = 'pm_11139481_csv_kv_pairs_data.csv'; my %data; my $last_index = -1; extract_data($source_data, \%data, \$last_index); { my @headers = sort keys %data; my $csv = Text::CSV::->new(); open my $csv_fh, '>', $created_csv; $csv->say($csv_fh, \@headers); for my $i (0 .. $last_index) { $csv->say($csv_fh, [map $data{$_}[$i], @headers]); } } sub extract_data { my ($in_file, $data, $last_index) = @_; open my $fh, '<', $in_file; while (<$fh>) { ++$$last_index; chomp; my $qs = (split /[?]/)[1]; next unless defined $qs; for my $kv_pair (split /[&]/, $qs) { my ($key, $val) = split /[=]/, $kv_pair; $data->{$key}[$$last_index] = $val; } } return; } [download] Obviously, you'd need to write your own version of `extract_data()`; however, the rest of the code should probably work pretty much as is. Here's the result: `$ cat pm_11139481_csv_kv_pairs_data.csv AB1,MM,NN,PQR,XYZ 101,,201,,401 102,302,,,402 103,303,203,,403 ,,,, ,,,, ,,,504,` [download] — Ken	[reply] [d/l] [select]

Back to Seekers of Perl Wisdom

Log In^?

Domain Nodelet^?

www.com | www.net | www.org

Node Status^?

node history
Node Type: perlquestion [id://11139481]
Approved by marto
Front-paged by haukex
help

Chatterbox^?

How do I use this? • Last hour • Other CB clients

Other Users^?

Others avoiding work at the Monastery: (2)

As of 2024-04-19 18:56 GMT

Sections^?

Information^?

Find Nodes^?

Leftovers^?

Today I Learned

Voting Booth^?

No recent polls found