Implementing filters as callbacks / hashrefs / regexes

clinton has asked for the wisdom of the Perl Monks concerning the following question:

I am implementing some filters to use while loading data, and I would like these filters to be customisable, so that you can do the following:


    my $c = Module->new( { filterX => XXXX } );

    where XXXX could be:
      - a hash ref                --> checks for exists $hashref->{$va
+lue}
      - a regex                   --> checks if the regex matches
      - an array ref of regexes   --> checks if any regex matches
      - a coderef                 --> the code is called
[download]

These filters will be called repeatedly, so I'd like to optimise (prematurely?) for speed. Also, Module also needs to be subclassable. So what I'm doing at the moment is this:

At object intialisation, it:

Checks all the filter parameters
If a value is passed, then assign an anonymous sub relevant to the value type (eg a HASH --> sub checks for exists)
Else, assign the coderef returned by $self->can($filter_name), which will catch any subclassed filters, or use the default implementation

Then I can use the filter with $self->{$filter}->($self,$value);

Advantage: Speed - All I have to do is to call the coderef with the values and we're done. All the up front work is done at initialisation.

Disadvantages: The "subclassability" is fixed as soon as the object is instantiated, which is not a standard OO approach. I would guess that this would rarely be a problem.

The actual code is below (I have removed a few filter specific checks from this code) :


    my %callbacks = (
        CODE    => \&_init_code_callback,
        HASH    => \&_init_hash_callback,
        ARRAY   => \&_init_array_callback,
    );
    
    #===================================
    sub _init_callback {
    #===================================
        my ( $self, $callback, $filter ) = @_;

        # If nothing set, use default or subclassed version
        unless ($filter) {
            $self->{$callback} = $self->can($callback);
            return;
        }

        # If not in callbacks, then assume it is a single regex
        $filter = [$filter]
            unless exists $callbacks{ref $filter};

        return $callbacks{ref $filter}->($filter);
    }


    #===================================
    sub _init_code_callback { 
    #===================================
        return $_[0] 
    };

    #===================================
    sub _init_hash_callback { 
    #===================================
        my $filter = shift;
        return sub {
            my $self  = shift;
            my $param = shift;
            return exists $filter->{$param};
        };
    }
    
    #===================================
    sub _init_array_callback { 
    #===================================
        my $filter = shift;

        foreach my $value (@$filter) {
            $value ||= '';
            die "'$value' is not a regular expression"
                unless ref $value eq 'Regexp';
        }
        return sub {
            my $self  = shift;
            my $value = shift;
            foreach my $regex (@$filter) {
                return 1 if $value =~ m/$regex/;
            }
            return 0;
        };
    }
[download]

Is my approach a reasonable one, or is it likely to cause confusion to other developers? Should I just bite the bullet and use (essentially) the same code that is in _init_callback in the default filters?

thanks

Clint

Comment on Implementing filters as callbacks / hashrefs / regexes Select or Download Code

Replies are listed 'Best First'.
Re: Implementing filters as callbacks / hashrefs / regexes by grinder (Bishop) on Jun 16, 2007 at 18:07 UTC
If you're concerned about performance because these things are being called repeatedly, then I would suggest that rather than matching an arrayref of regexps, that you should assemble them into a single pattern with Regexp::Assemble. This also has the added benefit of not having to implement the functionality of dealing with a list of regexps. • another intruder with the mooring in the heart of the Perl	[reply]
Re^2: Implementing filters as callbacks / hashrefs / regexes by clinton (Priest) on Jun 16, 2007 at 18:14 UTC
Regexp::Assemble is a nice module, but it would mean loading several hundred lines of code to replace what I'm doing in 10. I'm providing the "list of regexes" option as a fallback for completeness, for those who don't want to write a callback coderef, but it comes with the proviso that it will probably be slower than the other methods. I reckon that the other methods will probably get more use from those who are concerned with performance. They can always use Regexp::Assemble themselves - I'll add the suggestion into the docs. thanks Clint	[reply]
Re^3: Implementing filters as callbacks / hashrefs / regexes by ysth (Canon) on Jun 17, 2007 at 09:22 UTC
You seem to be assuming that loading hundreds of lines of code is a bad thing; can you say why you think so?	[reply]
Re^4: Implementing filters as callbacks / hashrefs / regexes by clinton (Priest) on Jun 17, 2007 at 10:58 UTC
Re: Implementing filters as callbacks / hashrefs / regexes by Zaxo (Archbishop) on Jun 17, 2007 at 01:53 UTC
Your hash of coderefs is generally called a dispatch table in perl. It's an excellent choice, and a popular enough idiom to cause no confusion. You may want to look into PerlIO::via, which addresses just your problem - preprocessing file I/O. It permits setting the processing scheme within the open mode argument, or with binmode. After Compline, Zaxo	[reply]
Re^2: Implementing filters as callbacks / hashrefs / regexes by clinton (Priest) on Jun 17, 2007 at 11:11 UTC
Hi Zaxo Your hash of coderefs is generally called a dispatch table in perl. It's an excellent choice, and a popular enough idiom to cause no confusion. It wasn't the dispatch table that I thought might cause the confusion, it was the fact that at object instantiation, I am "preresolving" each filter either to a coderef based on the parameter type (this part is fine), OR, to the codref returned by `$self->can($filtername)`. This second step returns a coderef to either a subclassed version of the filter, or to the default filter. If, after instantiation, the developer were to programatically add/change the subclassed version of the filter, then only new objects would notice. I realise this is an edge case, but this is why I was asking. You may want to look into PerlIO::via, which addresses just your problem Have a look at Re^4: Implementing filters as callbacks / hashrefs / regexes - maybe my use of the word filter was a bad choice here. The loading of data is handled by other modules. My filters control how it is treated after loading. thanks Clint	[reply] [d/l]