chrestomanci has asked for the wisdom of the Perl Monks concerning the following question:

Greetings wise brothers, I have a question.

I am working on a DBIx::Class table, where one row contains a regular expression stored as a plain string My current code looks like this:

package MyCompany::Result::WatchString; use base qw/DBIx::Class::Result/; use strict; __PACKAGE__->table('tblWatchString'); __PACKAGE__->add_columns( 'id', { data_type => 'int', is_auto_increment => 1 }, 'comment', { data_type => 'varchar', size => 30 }, # +Appears in logging messages 'log_level', { data_type => 'int', }, # +Log4perl log level. 'match_string', { data_type => 'varchar', size => 200 }, # +String in the VLC output to match on. ); sub match_re { my $self = shift; my $match_string = $self->match_string; my $matcher = qr/\Q$match_string\E/; return $matcher; }

In the main program, I then compare a large number of string against all the rows in the database. eg

my $logger = Log::Log4perl->get_logger(); open INPUT, '-|', '/some/program/with/verbose/output'; while( my $line <INPUT> ) { foreach my $watch_string ( $schema->resultset('WatchString')->all +) { if( $line =~ $watch_string->match_re() ) { # Do stuff my $log_message = join(' ', 'Interesting output '. $watch_ +string->comment, $line); $logger->log( $watch_string->log_level, $log_message); } } }

The problem is that $watch_string->match_re() is getting called a huge number of times, and the regular expression gets re-compiled every time. The algorithm is crying out for some sort of caching, but I can't work out how.

I have tried modifying match_re() to just cache the compiled regular expression in $self->{'_cached_re'} but the cache is not persistent between invocations.

I have read The DBIx::Class FAQ entry on this, but I could not get either the Moose or the Class::Accessor::Grouped method to work. When I tried Moose, my builder function got called every time, and the data did not appear to be cached. With Class::Accessor::Grouped the data was never stored.

I have tried using DBIx::Class::InflateColumn and creating an inflate method that returns the compiled regular expression, but it is getting called every time, and is not being cached.

I have tried DBIx::Class::VirtualColumns but when I tried to install via the CPAN client the tests failed.

There must be a way to do this, can anyone point me in the right direction, or point out what I have been doing wrong.

Thanks.

Replies are listed 'Best First'.
Re: How to cache a regular expression in a DBIx::Class object
by Your Mother (Archbishop) on Apr 08, 2011 at 18:09 UTC

    I actually thing the InflateColumn approach is best because it will DWIM most of the time. I *think* the only problem in this case is you're purposefully fetching everything on every pass. There's no need-

    foreach my $watch_string ( $schema->resultset('WatchString')->all +) # Could be... my $ws = $schema->resultset('WatchString') ->search({}, { cache => 1 }); while( my $line <INPUT> ) { foreach my $watch_string ( $ws->all )

    Untested, but as far as I know, it should work. It probably won't be the greatest optimization possible, that would probably involve dispensing with objects and inflating to hashrefs and doing the regex inflation, once, in place in the raw data. That might look like-

    my $ws = $schema->resultset('WatchString'); $ws->result_class('DBIx::Class::ResultClass::HashRefInflator'); my @ws = $ws->all; $_->{match_string} = qr/$_->{match_string}/ for @ws; while( my $line <INPUT> ) { foreach my $ws ( @ws ) { if( $line =~ $ws->{match_re} ) # all hashrefs now, no objects. {

    I'm less certain of that code but it should be close if not correct.

      Thank for your reply, I was not aware of the cache option on search.

      In my case, it would not be useful. The reason I am keeping the strings I am watching for in a database is that I want to be able to change them from time to time without restarting my perl process or the external process I am watching the output of. If I cached the search query results, or flattened them to a hash then they would no longer reflect changes in the database.

Re: How to cache a regular expression in a DBIx::Class object
by moritz (Cardinal) on Apr 08, 2011 at 15:57 UTC

    Try something like

    sub match_re { my $self = shift; # retrieve from cache if it's there: my $re = $self->{match_re}; return $re if defined $re; # otherwise compute it and store it: $re = $self->match_string; $re = qr/\Q$re\E/; return $self->{match_re} = $re; }

    in the result class.

      Thank you for replying. Unfortunately I have tried that and it did not work.

      When processing log output, I set a breakpoint in the perl debugger on the second line of the input file. (using $DB::single=1 if 0  != $INPUT_LINE_NUMBER)

      When I hit that breakpoint I stepped into the $watch_string->match_re() method and found that the regular expression that should have been cached was not there. It looks like DBIx::Class was re-creating objects every time.

      There are about 10 rows in the tblWatchString table, so I would not have thought that objects would be cleared as part of a memory saving or garbage collection system.

        It looks like DBIx::Class was re-creating objects every time.

        Oh, I missed that part. There's a much easier for that - move the retrieval of the DBIx::Class objects out of the loop:

        my @watchstrings = $schema->resultset('WatchString')->all while( my $line <INPUT> ) { for my $w (@watchstrings) { ... } }
Re: How to cache a regular expression in a DBIx::Class object
by wind (Priest) on Apr 08, 2011 at 16:51 UTC

    Use ||= to conditionally initialize, otherwise just returns the regex.

    sub match_re { my $self = shift; return $self->{match_re} ||= do { my $match_string = $self->match_string; qr/\Q$match_string\E/; }; }