sandcrawler has asked for the wisdom of the Perl Monks concerning the following question:

Good day Monks!

I'd probably try to do this with threads, but the system I'm working on isn't mine and Perl isn't compiled with threads. With that said here's what I'm trying to do. I can't be explicit as I'd like as my employer would frown upon me doing so. Instead I'll use a weather site as an analogy :)

My script reads a webpage, retrieves a list of csv data, each row being a "record" and this is inserted into a list of lists I'll call @weather.

We'll say each record is a city, zip code, and current time. Extended data is kept on a separate page reference by zipcode.

To get the extended data I read @weather, get the zip-code, and call a webpage that has the extended details. I append the extended details onto the record.

After I grab each set of records I'm pushing this to a database.

I have to do this every 5 minutes and serially, it's taking too long, but it does work.

Since it already works serially I'll just give a skeleton of what I've got when trying to do this in parallel and hopefully this is enough to troubleshoot.

#!/usr/bin/perl use strict; use LWP; use HTTP::Request::Common; use WWW::Mechanize; use XML::Simple; use Data::Dumper; use Encode; use Parallel::ForkManager; use IPC::Shareable; my glue = 'data'; my %options = ( create => 'yes', exclusive => 0, mode => 0600, destroy => 'yes', ); my @weather; my $shm = tie @weather, 'IPC::Shareable', $glue, { %options } or die " +Could not create shm\n"; # Fetch the snapshot $mech->get("http://mainurl.com/weather.jsp"); $mech->form_name('snapshotform'); $mech->field( 'userid', 'foo' ); my $snapshot = $mech->submit(); # split the snapshot into an array my @tmp = split( "\n", $snapshot->{_content} ); # put the CSV data into a list of lists for (@tmp) { push @weather, [ split(",", encode ( "UTF-8", "$_") )]; } ## Now that I have basic weather data I need to look at that list and +retrieve the extended data. ## my ($temp, $humid); my $pf = new Parallel::ForkManager(3); for (my $i = 0; $i < scalar(@records); $i++) { my $pid = $pf->start and next; $mech->get( "http://someurl.com/weather.jsp?zipcode=" . $weather[$i][1 +]); my $html = $mech->content( format => "text" ); ## The data comes back in tables upon embedded tables and I've found i +t easiest to just regex the values I need. ## unless ( ($temp) = ( $html =~ /some (regex)/ ) ) { $temp = "NULL" } unless ( ($humid) = ( $html =~ /some (regex)/ ) ) { $humid = "NULL" } $shm->shlock; push( @{ $weather[$i] }, "$temp", "$humid" ); $shm->shunlock; $pf->finish; } $pf->wait_all_children; ## More code that pushes the data to a database. ###EOF###


I think that's the skeleton of what I'm doing. If I do it serially, I can dump @weather and see data like
$var1 = [ SomeTown, 12345, 13:00, 65, 80 ] $var2 = [ SomeOtherTown, 23456, 13:00, 72, 45 ]
and so on...
I found in another node that the children will not have write access to @weather which was apparent because I had City,Zip,Time in each $weather[] but no temp or humidity.

I found another page in Japanese that appeared to be demonstrating what I'm trying to do but I can't read the comment the author made.

Using ipcs I see that the shm is being used.

I'm not sure what I'm doing wrong. It's probably something trivial since I'm new to IPC and perl.

Thanks in advance!

Kevin

Replies are listed 'Best First'.
Re: Parallel::ForkManager and IPC::Shareable
by derby (Abbot) on Apr 30, 2009 at 23:21 UTC

    Hmmm ... the destroy => 'yes' may be the issue ... from the IPC::Shareable docs:

    Use this option with care. In particular you should not use this optio +n in a program that will fork after binding the data.
    This one of the reasons I tend to use Cache::FastMmap for this type of thing nowadays.

    -derby