RFC - Data::DeepFilter

I have been working on a module that walks through the nodes of a deep structure and conditionally applies a filter callback to each node. Inline modification and Copy-On-Write is supported.

I am working on the final revisions of the code and fleshing out the POD some more before putting it on CPAN. Any feedback on naming, usefulness, etc would be appreciated. It has similarities to Data::Walk, but I feel it is sufficiently different to justify a separate module.

NAME

Data::DeepFilter - Deep modification of structures using callbacks. Optional Copy-On-Write.

SYNOPSIS

  use Data::DeepFilter qw(:all);
  deepfilter(
      data   => \%deep_hash,
      filter => filter_regex(qr/(abc)/, '\U\1')
      test   => name_in(qw(foo bar)),
  );

DESCRIPTION

Data::DeepFilter provides a mechanism for performing modifications to nodes within a deep structure. Optional support for Copy-On-Write, which protects the original structure.

deepfilter

  my $filtered_copy = deepfilter(
      data => \%deep_hash,
      filter => \&filter,
      test => \&test,
      safe => 1,
  );

Applies &filter to every node of $data that &test returns true for.

$data can be an arrayref or hashref.

The default behaviour is destructive in that it changes the structure that was passed. This behaviour can be overridden by specifying 'safe'.

Builtin filters

`filter_regex()`

  filter => filter_regex(qr/foo/,'bar')

Builtin tests

`name_in()`

  test => name_in(qw(foo bar))

`name_not_in()`

  test => name_not_in(qw(foo bar))

`name_like()`

  test => name_like(qr/foo/)

`name_not_like()`

  test => name_not_like(qr/foo/)

Filter callback specification

 sub filter_example {
     my ($value_ref) = @_;
     $$value_ref++;
 }

Test callback specification

 sub test_example {
     my ($name,$value_ref) = @_;
     return 1 if $name eq 'foo';
     return 1 if $$value_ref > 5;
     return;
 }

Comment on RFC - Data::DeepFilter Select or Download Code

Replies are listed 'Best First'.
Re: RFC - Data::DeepFilter by BrowserUk (Patriarch) on Aug 23, 2006 at 01:54 UTC
I don't see much here that cannot be done with Data::Rmap? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re^2: RFC - Data::DeepFilter by imp (Priest) on Aug 23, 2006 at 02:13 UTC
Ahh I had searched cpan and asked in the chatterbox for related modules, but the closest match in functionality I could find was Data::Walk. Data::Rmap looks like a nice module with a very similar intent, but it doesn't seem to use copy-on-write to protect the original structure. Please correct me if I overlooked something. My intention with the cheap copy-on-write is to only clone the branches that have changes, and to use a cheap shallow copy for the unchanged branches. This way if you have a reference to a large dataset (e.g. cached lookup tables) you will not be wasting memory by cloning unmodified nodes.	[reply]
Re^3: RFC - Data::DeepFilter by xdg (Monsignor) on Aug 23, 2006 at 11:32 UTC
I'd suggest contacting the author of Data::Rmap and discussing whether he would accept a patch from you that adds COW (or the option of it) before you consider releasing your own module. I know that pride of ownership makes it tough to give up on releasing something that you've written, but remember that monks are supposed to be humble, too. :-) -xdg Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.	[reply]
Re^3: RFC - Data::DeepFilter by bsb (Priest) on Aug 24, 2006 at 01:55 UTC
Hello imp, I wrote Data::Rmap and do like patches (even suggestions). Using COW in combination with Rmap hadn't occurred to me but I see it could be useful. Enabling it via an option to Rmap would probably be awkward, especially if you could just Data::COW::make_cow_ref the data structure then rmap the COW copy. I'd need to check that it works as I expect but don't have the time today... If it does work I'll add an example to the docs and leave Data::Rmap doing one simple task.	[reply]
Re: RFC - Data::DeepFilter by Hofmator (Curate) on Aug 23, 2006 at 09:37 UTC
I'm not too happy with the usage of the word 'filter'. For me, a filter is a device which takes input and allows only a certain part of the input to pass through, unmodified, so a part of the input is filtered out - think sieve. What you are doing is changing the elements. But, alas, I'm not a native speaker so I might be off here. As a better alternative I would suggest 'apply'. -- Hofmator	[reply]
Re^2: RFC - Data::DeepFilter by revdiablo (Prior) on Aug 23, 2006 at 17:05 UTC
But, alas, I'm not a native speaker so I might be off here. No, as a native speaker, I think you're dead on. When I think of a filter, I think of Perl's builtin grep, not its builtin map. Incidentally, `Data::DeepApply` sounds a bit odd. Maybe `Data::Apply::Deeply`. But then that makes me think it would be even better if it was `Data::Massage::Deeply`. =)	[reply] [d/l] [select]
Re^3: RFC - Data::DeepFilter by imp (Priest) on Aug 23, 2006 at 17:24 UTC
Part of me wanted to write it as an extension of Data::Walk, and name it Data::Walk::COW - because it amused me. But the interface wasn't quite what I was looking for.	[reply]