Well, I spent some more time thinking about the problem, and I came up with a slightly better algorithm. Instead of a recursive, depth-first approach, I used an iterative, breadth-first algorithm that I arrived at while trying to optimize my previous attempt into a tail-recursive scheme. Code and benchmarks follow.
#!/usr/bin/perl use strict; use warnings; use YAML::XS; use Benchmark qw(cmpthese); my $data; {local $/;$data = <DATA>}; cmpthese( -5, { recursive => sub {my $d = Load $data; mergekeys_recursive( $d +); }, tail_call => sub {my $d = Load $data; mergekeys_tail( $d ); }, loop => sub {my $d = Load $data; mergekeys_loop( $d ); }, } ); sub mergekeys_recursive { my ($ref) = @_; my $type = ref $ref; if ($type eq 'HASH') { my $tmphref = $ref->{'<<'}; if ($tmphref) { die "Merge key does not support merging non-hashmaps" unless (ref $tmphref eq 'HASH'); my %tmphash = %$tmphref; delete $ref->{'<<'}; %$ref = (%tmphash, %$ref); } mergekeys_recursive($_) for (values %$ref); } elsif ($type eq 'ARRAY') { mergekeys_recursive($_) for (@$ref); } return $ref; } sub mergekeys_tail { my ($ref) = (@_); _mergekeys($ref); return $ref; } sub _mergekeys { my $ref = shift or return; my $type = ref $ref; if ($type eq 'HASH') { my $tmphref = $ref->{'<<'}; if ($tmphref) { die "Merge key does not support merging non-hashmaps" unless (ref $tmphref eq 'HASH'); my %tmphash = %$tmphref; delete $ref->{'<<'}; %$ref = (%tmphash, %$ref); } push @_, grep {ref eq 'HASH' or ref eq 'ARRAY'} values %$ref; } elsif ($type eq 'ARRAY') { push @_, grep {ref eq 'HASH' or ref eq 'ARRAY'} @$ref; } goto &_mergekeys; } sub mergekeys_loop { my ($orig) = @_; while (my $ref = shift) { my $type = ref $ref; if ($type eq 'HASH') { my $tmphref = $ref->{'<<'}; if ($tmphref) { die "Merge key does not support merging non-hashmaps" unless (ref $tmphref eq 'HASH'); my %tmphash = %$tmphref; delete $ref->{'<<'}; %$ref = (%tmphash, %$ref); } push @_, grep {ref eq 'HASH' or ref eq 'ARRAY'} values %$r +ef; } elsif ($type eq 'ARRAY') { push @_, grep {ref eq 'HASH' or ref eq 'ARRAY'} @$ref; } } return $orig; } __DATA__
Data I use for benchmark:
--- key1: &id05 - &id02 name: Curly lastname: Howard hair: no occupation: stooge - <<: *id02 name: Larry lastname: Fine hair: curly - <<: *id02 name: Moe hair: bowl - <<: *id02 name: Shemp hair: yes on_again: off_again key2: subkey_a: - foo - phoo - ghoo - - [[[[[[[[[[[[[[asdf]]]]]]],1],3]],{whatever: works}]]]] - sounds: - &id03 voice: hollow says: plugh colors: - red - green - blue - <<: *id03 voice: wind says: do you hear what i hear - <<: *id03 voice: stooge says: - nyuk - nyuk - nyuk characters: *id05 - zxcv subkey_b: - &id01 name: bar type: variable weather: sunny - <<: *id01 name: baz hometown: Perth ...
And the results of the benchmark:
Rate recursive tail_call loop recursive 951/s -- -6% -14% tail_call 1016/s 7% -- -9% loop 1111/s 17% 9% --
I was surprised that the loop was that much faster than the tail-recursion, since it amounts to about the same thing (go to top of loop, check condition, execute or return). It is important to note, though, that I was only able to replace the goto with a while loop because my function was intended to be run for its side-effects on the data structure. If I needed to accumulate a return value, the goto would have been the better solution.
In reply to Re: A fix for merge keys in YAML::XS, YAML::Syck
by bv
in thread A fix for merge keys in YAML::XS, YAML::Syck
by bv
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |