comment on

Ok,

Simplified, short version:
How do I multithread read access to a single file? (using fork)

Long version: 8)
perl 5.005, sun solaris and linux (rh7)

I have a requirement to parse and load a flat file to an rdbms. Perl will need to scrub the data and the files can be on the large scale. (Couple million records) I thought, heck, lets multithread this thing! How hard could it be? Here's some sample code that I thought would work... (Copied from memory and commented)

--- Code snipplet

#!/usr/bin/perl

package SMF::Threader;  # Used for IPC between processes.

# Create a SMF::Threader object for memory sharing
sub new {
  my($class)=shift;
  my $self;

  open($self->{filehandle},"test.dat") || die $!;

  bless $self, $class;
}

# Every process will have to wait her turn to get a record
sub lock {
  my($self, $pid)=@_;
  push($self->{waits},$pid);

  until (${$self->{waits}}[0] == $pid) {
    ; #waiting for my turn
  };
  1;
}

# Release the next process
sub unlock {
  my $self=shift;

  shift (@{$self->{waits}});
  1;
}

# Get a record from the filehandle
sub fetch {
  my($self, $pid)=@_;

  $self->lock($pid); 

  # Can anyone tell me how to combine these next 2 lines?
  # <$self->{filehandle}> is a syntax problem
  my $fh=$self->{filehandle}; 
  my $row=<$fh>;

  $self->unlock;

  return $row or undef;
}
1;

package main;

use POSIX;

my $new=new SMF::Threader;

for (1..2) {  # Fork 2 processes

  unless (fork) {
    open(OUT, ">".$$.".out");

    while(my $record=$new->fetch($$) ) {
      # Record format is "0000000000abcdefg..xyz"
      my($num,$alpha)=unpack("a10 a26",$record);
      print $record unless length($alpha) == 26;
    }

    close OUT;
    exit;
  }

  sleep 1; # I don't think this is necessary because of my locking met
+hod,
           # But.. Just in case.

}

my $child;

do {

  $child = waitpid(-1,POSIX::WNOHANG); # Is WNOHANG not exported??

} until $child == -1;

exit;
[download]

--- Code snipplet

My assumption was that if I build $new (SMF::Threader) in the parent and use that in each child, it would create a memory segment shareable between the processes. Is that true? The problem is that the processes don't always get a complete record. (RS=newline) What am I overlooking? Am I going to have to use a semaphore to keep track of the locks? I think I will still build that into SMF::Threader (Named something different) as I might have a reason to reuse it for database read access. (MUCH LATER) 8) Any problems you see with that? (CORBA? Definately overkill I think)

All help will be greatly appreciated!

Shawn M Ferris
Oracle DBA - Time Warner Telecom

In reply to Threading read access to a filedescriptor by smferris

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.