RecursionBane has asked for the wisdom of the Perl Monks concerning the following question:
Greetings, Monks!
I'm attempting to write a subroutine to perform regexp-based string substitution on the deepest elements of a complex, non-deterministic hash structure (i.e., it could be a HoAoHoA or a HoHoA or a HoA, or any other combination, to any depth). Note that we have no control over this structure. This operation will be performed infrequently enough that I've optimized for readability rather than runtime.
This is the (untested!) code I'm attempting; please let me know if there's an easier way to do this:
sub search_and_replace_in_hash {
# Given a hash, perform a perl regexp operation on every value, re
+gardless of how deep the hash is
# Usage: search_and_replace_in_hash <\%hash> <expression>
# Returns: New hash reference with replacement performed on deepes
+t elements
my %hash = %{shift()};
my $op = shift();
foreach my $key (sort keys %hash) {
if ( reftype( $hash{$key} ) eq "HASH" ) {
if (scalar( keys( %{$hash{$key}} ) ) > 0) {
%hash = %{search_and_replace_in_hash( $hash{$key}, $op
+ )};
}
next;
}
if ( reftype( $hash{$key} ) eq "ARRAY" ) {
if (scalar( @{$hash{$key}} ) > 0) {
foreach my $list_element (@{$hash{$key}}) {
if (reftype($list_element) eq "HASH") {
%hash = %{search_and_replace_in_hash( $hash{$k
+ey}{$list_element}, $op )};
next;
}
}
} else {
next;
}
}
# If you get here, you're a scalar
%hash{$key} =~ $op;
}
return \%hash;
}
I've looked at Hash::Util, but it didn't seem to handle N-deep structures.
Update: One thing I've just thought of is stringifying the entire structure, performing the search and replace, then converting the structure back. Would this be easier/more efficient? Is Data::Dumper the way to go for this?
Any ideas? Thanks in advance!
~RecursionBane
Re: Manipulate deepest values in complex hash structures
by LanX (Saint) on Apr 29, 2013 at 20:28 UTC
|
But I'm pretty sure you are reinventing the wheel, better search the archives or CPAN. (Maybe Data::Diver or Data::Walker already do what you want.)
Cheers Rolf
( addicted to the Perl Programming Language)
updates
various ...
| [reply] [d/l] [select] |
|
enough theory, now practical code:
use strict;
use warnings;
use Data::Dump qw/pp/;
use Scalar::Util qw/reftype/;
sub walk {
my ($entry,$code) =@_;
my $type = reftype($entry);
$type //= "SCALAR";
if ($type eq "HASH") {
walk($_,$code) for values %$entry;
}
elsif ($type eq "ARRAY") {
walk($_,$code) for @$entry;
}
elsif ($type eq "SCALAR" ) {
$code->($_[0]); # alias of entry
}
else {
warn "unknown type $type";
}
}
my $test = { a => [2, 3, [4, { b => 42 }]], c => 5 };
pp $test;
walk $test, sub { $_[0]+=100 };
pp $test;
Output
/usr/bin/perl -w /tmp/walker.pl
{ a => [2, 3, [4, { b => 42 }]], c => 5 }
{ a => [102, 103, [104, { b => 142 }]], c => 105 }
Cheers Rolf
( addicted to the Perl Programming Language)
UPDATE
of course you could extend walk to call optional $coderefs on every node and specify them as named parameters:
walk $ref , SCALAR => sub { print $_[0] } , ARRAY => sub { Dump $_[0] }, ...
But I think the original code is so small thats its easier to copy and manipulate it in place for different needs.
| [reply] [d/l] [select] |
|
Great post, LanX. I have two questions:
For the line:
$type //= "SCALAR";
am I correct in thinking that this is using the or operator as you would like += or .=? So that $type takes the value "SCALAR" if it doesn't already have a value? I looked in perlop, but I wasn't aware you could use logical operators like that...
My second question is how
$code->($_[0]);
performs the operation on $_[0]. Currently I'm trying to become more comfortable with references (and they're very confusing), so I'd appreciate it if you could point me in the correct direction. I've already read http://perldoc.perl.org/perlreftut.html, but I'm trying to make it stick now.
Thanks in advance!
| [reply] [d/l] [select] |
|
|
Thanks for the feedback, Rolf!
I'll clean up my code and look into Data::Diver. It looks promising.
~RecursionBane
| [reply] |
Re: Manipulate deepest values in complex hash structures
by hdb (Monsignor) on Apr 29, 2013 at 20:48 UTC
|
use strict;
use warnings;
my %dispatch =
(
# scalar
'' => sub { my ( $element, $op ) = @_;
$op->( $element );
},
'HASH' => sub { my ( $element, $op ) = @_;
for my $key (keys %{$element}) {
search_and_replace_in_hash( $element->{$key}, $op );
}
},
'ARRAY' => sub { my ( $element, $op ) = @_;
search_and_replace_in_hash( $_, $op) for @{$element};
},
'default' => sub { my ( $element, $op ) = @_;
print STDERR ref($element).": unknown type of reference\
+n"
},
);
# operates on references
sub search_and_replace_in_hash {
&{ $dispatch{ ref($_[0]) } // $dispatch{'default'} }(@_);
}
my $var = { "0x55555555" => { "0x55555555" => [ ["0xAAAAAAAA", "0x9"],
+ ],
"0xAAAAAAAA" => [ ["0xAAAAAAAA", "0x8"], ], },
"0xAAAAAAAA" => { "0x55555555" => [ ["0xFFFFFFFF", "0x8"], ],
"0xAAAAAAAA" => [ ["0x55555554", "0x3"], ], }, };
search_and_replace_in_hash( $var, sub { $_ =~ s/0x//; } );
search_and_replace_in_hash( $var, sub { print "$_\n"; } ); # no need f
+or Data::Dumper
For other types of references, you need to expand the dispatch table accordingly.
Thanks to Rolf for his helpful comments above.
| [reply] [d/l] |
|
Thank you, hdb! This is the first time I'm seeing dispatch tables; that's really neat!
~RecursionBane
| [reply] |
Re: Manipulate deepest values in complex hash structures
by Laurent_R (Canon) on Apr 29, 2013 at 22:26 UTC
|
Hi,
even though the code below compiles cleanly and does the expected work for the example chosen, please consider it as a pseudocode example (rather than a real solution) of a recursive approach to the traversing of your data structure. I haven't tested any more complicated examples, I have considered that the structure could only include hash and array ref's and made a number of other simplifications.
I agree with Rolf that it is better to use a coderef, so that's what I did (simply printing the values in my code below) rather than a hardcoded regexp, so as to make the walk_data_struct function usable for many other purposes.
my %hash = (jan => [january, '01'], feb => [february, '02'], mar => [m
+arch, '03']);
walk_data_struct (\%hash, sub { my $leaf = shift; print "$leaf \n";});
sub walk_data_struct {
my ($data_ref, $code_ref) = @_;
if (ref $data_ref) {
if (ref ($data_ref) eq "HASH") {
walk_data_struct ($data_ref->{$_}, $code_ref) foreach keys
+ %$data_ref;
} else {
walk_data_struct ($_, $code_ref) foreach @$data_ref ;
}
} else {
&{$code_ref} ($data_ref);
}
}
This prints the leaves of my simple HoA:
$ perl walk_HoA.pl
february
02
january
01
march
03
EDIT: Ooops, I had not seen that some actual solutions had been proposed meanwhile. The last post I had seen when I wrote the above was Rolf's post giving guidelines on how to solve the problem, but without any code. I had not seen any of the later posts with actual code. For some reason, I had missed the later posts. I still leave my solution, hoping that it might still help a little bit.
| [reply] [d/l] [select] |
Re: Manipulate deepest values in complex hash structures
by jakeease (Friar) on Apr 30, 2013 at 09:32 UTC
|
| [reply] |
Re: Manipulate deepest values in complex hash structures
by Anonymous Monk on Apr 29, 2013 at 22:47 UTC
|
| [reply] |
|
IMHO Data::Leaf::Walker can't be used in this particular task of changing leave values.
Or can you see a possibility to pass an absolute key-path which selects only the deepest leaves?
And the each-method will only return copies which can't be manipulated.
OTOH generating an iterator is for sure an interesting alternative.
Cheers Rolf
( addicted to the Perl Programming Language)
| [reply] |
|
No arguments that it may or may not be appropriate in this case. I mention it mainly to point out (as other Monks have, since) that “canned” walkers also exist, which might (or might not) be suitable.
| |
Re: Manipulate deepest values in complex hash structures
by gnosti (Chaplain) on Apr 30, 2013 at 16:39 UTC
|
Another CPAN candidate that I've had good experience with is Data::Rmap. The API is modeled after perl's map function. | [reply] [d/l] |
|
> Another CPAN candidate that I've had good experience with is Data::Rmap.
Yep! Looks good, at least from the documentation!
IMHO the CPAN module for this special task till now! =)
Cheers Rolf
( addicted to the Perl Programming Language)
| [reply] |
|
|