Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Database Search Approaches

by saberworks (Curate)
on Aug 23, 2005 at 20:11 UTC ( [id://486008]=perlquestion: print w/replies, xml ) Need Help??

saberworks has asked for the wisdom of the Perl Monks concerning the following question:

Our software has hundreds of database tables, dozens of which I have been tasked with making "searchable." And by "searchable" I mean a "global" search box that will do text matching against an index that is build from specific fields in each of these tables.

The task is complicated by the fact that each table doesn't have a clearly defined url-mapping scheme (so matching a row in a table is difficult to translate into a URL where the information can be accessed directly). I considered using a sprintf() type format specifier, but I think that will be too limited in the long run.

So the plan now is to have a generic "Searchable" class which can handle the easy-to-capture tables, such as ones that have a direct mapping from primary keys to urls. Then there will be other "Searchable" subclasses which are responsible for separating their content into "documents" which can then be added to the search index. The documents must contain a title, a url, a summary, and a body in a well-defined format ("document" in this case is not a file, but rather an in-memory hash representation of this theoretical document). For example, if a table has a two-field primary key and three columns that we want searchable, the subclass will be responsible for turning that into a "document" which can be searched in a standard way.

What I want is an easy way to drop in a subclass which will represent another table and have the Indexer script catch that and call it's get_list() method automatically. This must run under strict and I'd like to not { no strict whatever; }, but I'm open to that if it's necessary. I've come up with something like:
#!/usr/bin/perl -w use strict; use SearchIndexer; use Searchable::Generic; use Searchable::Folder; use Searchable::FamilyTree; # ... whatever else ... my @searchable = qw(Searchable::Generic Searchable::Folder Searchable: +:FamilyTree); my $indexer = new SearchIndexer(); foreach my $module (@searchable) { $indexer->add($module::get_list()); } $indexer->run();
So I'm wondering if anyone can come up with a better way to handle this. The above method will work, but obviously I have to add the entry to two places, in the "header" I have to add a "use module;" and then I also have to add it to the @searchable array. I want something that's more transparent, and yes, I'm open to restructuring this to make more sense if someone can think of a better way to do it.

Replies are listed 'Best First'.
Re: Database Search Approaches
by Ven'Tatsu (Deacon) on Aug 23, 2005 at 21:17 UTC
    I would set up a factory method in Searchable that is configured external to your program. Since you are already dealing with a database that is ideal. Set up a table with 2 columns (or more if you have a reason to) one the primary key lists the table names you want to have searchable, the second holds the name of the module to handle that table. Then the factory function would be something like this:
    #called as Searchable->getTableHandler($table); sub getTableHandler { my $class = shift; my $table = shift; #get $handlerClass from db eval "use $handlerClass;"; #deal with errors return $handlerClass->new($table); }
    The advantages of this is that you don't have to alter the main script any time you want to add a table, and there is only one place new tables and modules need to be added. Down sides are that it uses string eval to pull in the module at run time, that can cause delays as the compiler fires up again, and it risks that if someone gets some malicious code into your configuration you could have a serious problem.

      I'm not sure why everyone keeps using string eval to load modules. Heck, even UNIVERSAL::require does it, which confounds the heck out of me.

      sub my_require { my $class = shift; (my $pm = $class) =~ s(::)(/); $pm .= '.pm'; require $pm; # optional: # $class->import(); }
      Then it's as simple as eval { my_require($handlerClass); }. Eliminating all the delays and risks of string eval. Of course, if someone can put a new module in your @INC and update your table, that someone could be malicious, but I don't think there's much that you can do about that.

      In this case I could just query the filesystem directly for a list of Searchable/*.pm modules.
Re: Database Search Approaches
by shemp (Deacon) on Aug 23, 2005 at 20:59 UTC
    I dont particularly like this solution, but it worked for a simple test i ran:
    #!/usr/bin/perl use strict; use warnings; my @searchable; BEGIN { @searchable = qw(...); # the list to include foreach my $module (@searchable) { eval "use $module"; if ( $@ ) { die "error using: $module\n"; } } }
    It allows you to only have to specify each module once. A potential problem is if you need directives in the use statements for any of the modules, example:
    use Params::Check qw(check);

    I use the most powerful debugger available: print!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://486008]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (2)
As of 2024-04-20 06:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found