comment on

Hello Monks, I am here to seek your wisdom in the following matter. I am developiong a program which reads Large (upto 2 Gb) input files (upto a max of 5 files. each containing the same type of data) and splits it into 18 different parts ..relevant info going to relevant file. There can be upto 5 files for the type I am looking at. As of now it is a non-threaded process. And I am thinking about making it multi threaded. to speed up the process. However, when I tested the potential benefit for the same using the below test program that uses threads... (reads 5 files and outputs slightly manipulated contents to o/p file) this infact shows me that use threads has even slowed down the process. Env : Perl 5.8.8 build 824 Win XP Code :

#!/usr/bin/perl

use threads;
use Benchmark qw(:all) ;

my $line_var :shared = 0;

sub main_func {
    my ($tid, $in_fh , $out_fh, $start , $stop) = @_;
    # Synchronised block
    ++$line_var;

    while (<$in_fh>) {
        print $out_fh " LineVar.. $line_var\t" . $_ ;
    }
    return $tid;
}

sub super_main {

    open (OUTFH1 , "< out1.txt");
    open (OUTFH2 , "< out2.txt");
    open (OUTFH3 , "< out3.txt");
    open (OUTFH4 , "< out4.txt");
    open (OUTFHO1 , "> outO1.txt");
    open (OUTFHO2 , "> outO2.txt");
    open (OUTFHO3 , "> outO3.txt");
    open (OUTFHO4 , "> outO4.txt");

    $thr1 = threads->create(\&main_func , '1', OUTFH1,OUTFHO1, '1' , '
+1000000');
    $thr2 = threads->create(\&main_func , '2', OUTFH2,OUTFHO2, '100000
+0' , '2000000');
    $thr3 = threads->create(\&main_func , '3', OUTFH3,OUTFHO3,'2000000
+' , '3000000');
    $thr4 = threads->create(\&main_func , '4', OUTFH4,OUTFHO4,'3000000
+' , '4000000');

    $tid1 = $thr1->join();
    $tid2 = $thr2->join();
    $tid3 = $thr3->join();
    $tid4 = $thr4->join();
}

sub main_func2 {
    my $line_var2 = 0;
    open (OUTFHO5 , "> outO5.txt");
    my ($tid, $out_fh ,$start , $stop) = (5,OUTFHO5,'1' , '4000000');
    for ($i = 1 ; $i<5; $i++) {
        open (INFH , "< out$i.txt");
        while (<INFH>) {
            $line_var2++;
            print $out_fh " Line.. $line_var2\t" . $_ ;
        }
    }
    return $tid;
}



#timethese ( 20, 
#            {'before' => \&main_func2 ,
#            'after'  => \&super_main }
#            );

cmpthese ( 20, 
            {'before' => \&main_func2 ,
            'after'  => \&super_main }
            );
[download]

Bothe timethese & compthese show poor performance for after..

---------- Perl ----------
       s/iter  after before
after    10.9     --   -18%
before   8.99    22%     --

Output completed (12 min 45 sec consumed) - Normal Termination
[download]

So the question is... Have I made a mistake in the program... OR is threading not tht beneficial ??

In reply to Threads Doubt by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.