natestraight has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

Caveat: ABSOLUTE, COMPLETE n00b to perl. I'm only using it because it has the wonderful WordNet::Similarity module that I need for some textual analysis.

The problem:

I am running the aforementioned WordNet::Similarity module through a VBA script to calculate a ton of semantic relatedness formulas on documents in an excel sheet (they need to stay in an excel sheet, too, so I can't try mysql or anything like that).

The perl module works fine, except that it loads the massive WordNet lexicon every single time it runs, which would take too long to be practical for the literally hundreds of thousands of times that I need to run the module.

I need the absolute simplest way to keep the WordNet lexicon loaded as a perl variable while the Similarity module runs all the queries and calculates the formulae. I have been trying to follow a procedure that I found mentioned here, but to pretty much no avail. It boots me with a message of "Can't use an undefined value as a HASH reference in DBM/Deep/Hash.pm line 33."

Here are my scripts, can you offer any help? Again, I just need the absolute simplest, most sure-fire method for doing this. I just need read-only access to the lexicon, and only for about a minute or two while the other scripts run the formulas. I've seen modperl, but would prefer to not use a server, as its too much to configure for just this purpose.

Initialize script (to supposedly load the WordNet lexicon into a DBM::Deep-readable database):

use WordNet::QueryData; my $x= WordNet::QueryData->new(); delete $x->{data_fh}; use DBM::Deep; my $h=DBM::Deep->new("wordnet.db"); $h->{foo}=$x;


And the script to actually run the queries:

#! C:\Perl\bin -w use strict; use warnings; # Sample Perl program, showing how to use the # WordNet::Similarity measures. # WordNet::QueryData is required by all the # relatedness modules. use WordNet::QueryData; # 'use' each module that you wish to use. use WordNet::Similarity::wup; # Get the concepts. my $wps1 = shift; my $wps2 = shift; # Load WordNet::QueryData use DBM::Deep; my $h = DBM::Deep->new("wordnet.db")->{foo}; #convert the top level to regular hash my $wn = {}; %$wn=%$h; delete $wn->{data_fh}; #restore contents of $wn->{data_fh} bless $wn, 'WordNet::QueryData'; $wn->openData; # Create an object for each of the measures of # semantic relatedness. my $wup = WordNet::Similarity::wup->new($wn, "/config-files/config-wup +.conf"); # Find the relatedness of the concepts using each of # the measures. my $value = $wup->getRelatedness($wps1, $wps2);


The "WordNet::QueryData" module is the one that loads the lexicon.

Replies are listed 'Best First'.
Re: Absolute simplest way to keep a database variable persistent?
by GrandFather (Saint) on Oct 03, 2008 at 22:22 UTC

    The sticking point that blocks a direct implementation of the type of solution you would best like is that Perl blesses references to turn them into what amount to objects. WordNet::QueryData->new() returns a blessed object and that blessed state is what is difficult to persist in the manner you wish.

    There may be another solution though. Perl is pretty good at driving OLE. In fact VB can generally be translated to Perl without too much trouble (after a little experience anyway). If your VBA script doesn't need much UI then it may be possible to turn things inside out a little and drive Excel from Perl to get the work done. Win32::OLE is the place to start exploring this option.

    Another way you may be able to go about it is to have Perl manipulate your Excel files directly. Again, there is a fair amount of support for that in CPAN.


    Perl reduces RSI - it saves typing
      Ah.

      Unfortunately, I'd really like the work to be done inside Excel as the whole point of the project is to be dynamic (i.e. call the Perl script as part of a volatile VBA function), with the parameters changing based on user input.

      The Excel sheet contains data that is automatically refreshed daily from a ton of blog feeds, and the WordNet modules are going to be used to perform some semantic analysis based on selected user-specified blog posts, phrase lengths, etc.

      Nevertheless, it's not entirely impossible that this be done outside of Excel. I could dump the entire worksheet into a MySQL database or the like, but then I really lose a lot of the dynamic, auto-refresh capabilities that I'd like.

      I'll look into this option, though, as it sounds like it would at least work. Thanks.
      DBM::Deep stores the blessed-ness of the object. That's been there since before I took it over. It would be somewhat useless if it didn't. :-)

      My criteria for good software:
      1. Does it work?
      2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
        DBM::Deep seems to fail the first of your criteria for good software, though... unless something is screwy with my installation. *shrug*

        DBM::Deep sounds like it should do exactly what I want, but for the life of me I can't get it to work. I have copied line-for-line the script in the link in my original post that someone else was able to use for exactly the same task as my application, but it won't run. It gets killed during the cleanup stage with the error "Can't use an undefined value as a HASH reference in DBM/Deep/Hash.pm line 33."

        I have all the dependencies for DBM::Deep (and sub-dependencies) installed. I have made sure all of the versions are compatible. I don't have any other unnecessary Perl modules running. What's causing the "undefined value" error?

        Reading through some basic Perl tutorials online, I can see how the DBM::Deep scripts I copied should be working fine. I'm getting the sinking feeling, though, that I'm missing some simple change (like changing a *insert your local name here* type of reference in the example code to what it should actually be for my application; maybe the {foo} reference? I don't know) that should be obvious. It doesn't seem like the global destruction issue should be popping up and killing my scripts like it's doing.
      As of right now, I've turned things inside out the other way around.

      My VBA function now dynamically writes out the requisite perl script as one massive text file (that loads the lexicon once, then runs all of the individual queries in the same script so that the lexicon object persists obviously until the end of the script), and then calls that text file back into the shell inside VBA.

      So, right now I'm actually driving Perl from Excel. Very strange. It seems to me like a really ugly fix, but it works fairly well... except when it comes to recalculating anything, and the formula has to regenerate the text file and reload the lexicon (since it, as always, closed after the last dynamically-generated script finished).
Re: Absolute simplest way to keep a database variable persistent?
by Jenda (Abbot) on Oct 03, 2008 at 23:38 UTC

    Another option, though not free, would be to use ActiteState's PDK (namely PerlCtrl) to wrap up the Perl stuff as an OLE/COM/ActiveX object which you can then load once and use many times from the VBA.

      This sounds fantastic, but I'd probably just go out and buy a professional semantic analysis tool if I was going to go the non-free route. Are there any other options for wrapping Perl scripts?
        One way to wrap a perl program is to use a web server. Create a server that takes an input of two strings and return the relatedness. The server can keep WordNet loaded between requests.

        The server could be something like Apache and mod_perl, or it could be something like POE::Component::Server::HTTP.

        I use the Cygwin environment and the POE modules to interface to other programs this way. However, I wasn't able to get POE working well until a learned a fair amount of Perl. Maybe you will get lucky and find a POE example that is very close to what you need.

        This might not be the 'absolute simplest' to develop, but after you get it working your code will be quite simple!

        It should work perfectly the first time! - toma
Re: Absolute simplest way to keep a database variable persistent?
by clwolfe (Scribe) on Oct 08, 2008 at 16:18 UTC
    How about IPC (inter-process communications) using STDIN and STDOUT?

    Create a long-running perl script that starts up, loads your massive library, then waits for input on STDIN. Devise a simple "protocol" to communicate to the perl API from VBA, then outputs the results to STDOUT. All this in a while(1) loop.

    From VBA, when the spreadsheet opens, start up the perl process and get handles on its input and output. Then feed it function calls and read the results.

    I have no idea how easy/possible it is to do that in VBA, but it's not unreasonable. You'll also need to be aware of deadlocking issues, etc.

    This approach would give you the benefits of a long-running, compile-once perl instance, without the learning curve of apache+modperl.

      This sounds like the best solution yet.

      I hate to have to remind you that I'm a complete n00b, however. (Though I am learning quite a bit trying to figure out a good method on my own.)

      Can you show me what a basic long-running script such as this would look like? Particularly the "waits for input on STDIN" section? Thanks.
        Here's the basics:
        use strict; # This turns off output buffering, so output is immediate. $| = 1; # Load Wordnet use WordNet::QueryData; # Adjust location as needed my $wordnet = WordNet::QueryData->new("wordnet.db"); # This reads lines from STDIN. If there is no input, # it waits. while (my $input = <>) { # Remove the newline from the end chomp $input; # Here we have to think about a protocol. # I suggest something of the form command;arg1;arg2 my ($command, $arg1, $arg2) = split ';', $input; my @results; # Dispatch based on command if ($command eq 'quit') { exit; } elsif ($command eq 'word') { # Do a WordNet queryWord @results = $wordnet->queryWord($arg1, $arg2); } elsif ($command eq 'sense') { @results = $wordnet->querySense($arg1, $arg2); } else { die "Unknown command $command"; } # OK, send the results back out over STDOUT my $line = join ',', @results; print $line . "\n"; }
        Now then, that's the Perl side. You can test it from the command line like this (if you saved it in wordnet-ipc.pl):
        perl.exe wordnet-ipc.pl
        It should appear to "hang", but really it's just waiting on your input. Try typing sense;run#v. To exit, type exit. The part I'm still fuzzy on is how to start up a process from VBA and get stream objects on it's input and output. I'm on a linux box now, so I don't have access to my VBA docs. I'll try to have a shot at it when I get home. --Clinton

        To apply a nomenclature that may help understand clwolfe's idea consider the Perl application as a server and your VBA code as a client. The overall process is a little like using OLE to allow one application (your VBA code) to connect to another application (the Perl code).

        In fact it may be that using OLE is exactly the solution you are looking for. It is not a 'free' option however. ActiveState provide tools for doing the heavy lifting as part of their pro package. See OLE automation server and Creating a COM/OLE server for some discussion of this option.


        Perl reduces RSI - it saves typing