clinton has asked for the wisdom of the Perl Monks concerning the following question:

I am implementing some filters to use while loading data, and I would like these filters to be customisable, so that you can do the following:

my $c = Module->new( { filterX => XXXX } ); where XXXX could be: - a hash ref --> checks for exists $hashref->{$va +lue} - a regex --> checks if the regex matches - an array ref of regexes --> checks if any regex matches - a coderef --> the code is called
These filters will be called repeatedly, so I'd like to optimise (prematurely?) for speed. Also, Module also needs to be subclassable. So what I'm doing at the moment is this:

At object intialisation, it:

Then I can use the filter with $self->{$filter}->($self,$value);

Advantage: Speed - All I have to do is to call the coderef with the values and we're done. All the up front work is done at initialisation.

Disadvantages: The "subclassability" is fixed as soon as the object is instantiated, which is not a standard OO approach. I would guess that this would rarely be a problem.

The actual code is below (I have removed a few filter specific checks from this code) :

my %callbacks = ( CODE => \&_init_code_callback, HASH => \&_init_hash_callback, ARRAY => \&_init_array_callback, ); #=================================== sub _init_callback { #=================================== my ( $self, $callback, $filter ) = @_; # If nothing set, use default or subclassed version unless ($filter) { $self->{$callback} = $self->can($callback); return; } # If not in callbacks, then assume it is a single regex $filter = [$filter] unless exists $callbacks{ref $filter}; return $callbacks{ref $filter}->($filter); } #=================================== sub _init_code_callback { #=================================== return $_[0] }; #=================================== sub _init_hash_callback { #=================================== my $filter = shift; return sub { my $self = shift; my $param = shift; return exists $filter->{$param}; }; } #=================================== sub _init_array_callback { #=================================== my $filter = shift; foreach my $value (@$filter) { $value ||= ''; die "'$value' is not a regular expression" unless ref $value eq 'Regexp'; } return sub { my $self = shift; my $value = shift; foreach my $regex (@$filter) { return 1 if $value =~ m/$regex/; } return 0; }; }

Is my approach a reasonable one, or is it likely to cause confusion to other developers? Should I just bite the bullet and use (essentially) the same code that is in _init_callback in the default filters?

thanks

Clint

Replies are listed 'Best First'.
Re: Implementing filters as callbacks / hashrefs / regexes
by grinder (Bishop) on Jun 16, 2007 at 18:07 UTC

    If you're concerned about performance because these things are being called repeatedly, then I would suggest that rather than matching an arrayref of regexps, that you should assemble them into a single pattern with Regexp::Assemble.

    This also has the added benefit of not having to implement the functionality of dealing with a list of regexps.

    • another intruder with the mooring in the heart of the Perl

      Regexp::Assemble is a nice module, but it would mean loading several hundred lines of code to replace what I'm doing in 10. I'm providing the "list of regexes" option as a fallback for completeness, for those who don't want to write a callback coderef, but it comes with the proviso that it will probably be slower than the other methods.

      I reckon that the other methods will probably get more use from those who are concerned with performance. They can always use Regexp::Assemble themselves - I'll add the suggestion into the docs.

      thanks

      Clint

        You seem to be assuming that loading hundreds of lines of code is a bad thing; can you say why you think so?
Re: Implementing filters as callbacks / hashrefs / regexes
by Zaxo (Archbishop) on Jun 17, 2007 at 01:53 UTC

    Your hash of coderefs is generally called a dispatch table in perl. It's an excellent choice, and a popular enough idiom to cause no confusion.

    You may want to look into PerlIO::via, which addresses just your problem - preprocessing file I/O. It permits setting the processing scheme within the open mode argument, or with binmode.

    After Compline,
    Zaxo

      Hi Zaxo

      Your hash of coderefs is generally called a dispatch table in perl. It's an excellent choice, and a popular enough idiom to cause no confusion.

      It wasn't the dispatch table that I thought might cause the confusion, it was the fact that at object instantiation, I am "preresolving" each filter either to a coderef based on the parameter type (this part is fine), OR, to the codref returned by $self->can($filtername).

      This second step returns a coderef to either a subclassed version of the filter, or to the default filter. If, after instantiation, the developer were to programatically add/change the subclassed version of the filter, then only new objects would notice. I realise this is an edge case, but this is why I was asking.

      You may want to look into PerlIO::via, which addresses just your problem
      Have a look at Re^4: Implementing filters as callbacks / hashrefs / regexes - maybe my use of the word filter was a bad choice here. The loading of data is handled by other modules. My filters control how it is treated after loading.

      thanks

      Clint