Ok,

Simplified, short version:
How do I multithread read access to a single file? (using fork)

Long version: 8)
perl 5.005, sun solaris and linux (rh7)

I have a requirement to parse and load a flat file to an rdbms. Perl will need to scrub the data and the files can be on the large scale. (Couple million records) I thought, heck, lets multithread this thing! How hard could it be? Here's some sample code that I thought would work... (Copied from memory and commented)

--- Code snipplet
#!/usr/bin/perl package SMF::Threader; # Used for IPC between processes. # Create a SMF::Threader object for memory sharing sub new { my($class)=shift; my $self; open($self->{filehandle},"test.dat") || die $!; bless $self, $class; } # Every process will have to wait her turn to get a record sub lock { my($self, $pid)=@_; push($self->{waits},$pid); until (${$self->{waits}}[0] == $pid) { ; #waiting for my turn }; 1; } # Release the next process sub unlock { my $self=shift; shift (@{$self->{waits}}); 1; } # Get a record from the filehandle sub fetch { my($self, $pid)=@_; $self->lock($pid); # Can anyone tell me how to combine these next 2 lines? # <$self->{filehandle}> is a syntax problem my $fh=$self->{filehandle}; my $row=<$fh>; $self->unlock; return $row or undef; } 1; package main; use POSIX; my $new=new SMF::Threader; for (1..2) { # Fork 2 processes unless (fork) { open(OUT, ">".$$.".out"); while(my $record=$new->fetch($$) ) { # Record format is "0000000000abcdefg..xyz" my($num,$alpha)=unpack("a10 a26",$record); print $record unless length($alpha) == 26; } close OUT; exit; } sleep 1; # I don't think this is necessary because of my locking met +hod, # But.. Just in case. } my $child; do { $child = waitpid(-1,POSIX::WNOHANG); # Is WNOHANG not exported?? } until $child == -1; exit;
--- Code snipplet

My assumption was that if I build $new (SMF::Threader) in the parent and use that in each child, it would create a memory segment shareable between the processes. Is that true? The problem is that the processes don't always get a complete record. (RS=newline) What am I overlooking? Am I going to have to use a semaphore to keep track of the locks? I think I will still build that into SMF::Threader (Named something different) as I might have a reason to reuse it for database read access. (MUCH LATER) 8) Any problems you see with that? (CORBA? Definately overkill I think)

All help will be greatly appreciated!

Shawn M Ferris
Oracle DBA - Time Warner Telecom


In reply to Threading read access to a filedescriptor by smferris

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.