Hi Perl gurus, I met a problem when I tried to apply a multithread perl script. In my script, I need to read two lists of files (for example, list1 has 7 files, and list2 has 5 files) and grab some information into a hash (If there are 12 files, then I create 12 threads to read files and put all the information into the same hash). Each file has ~21 millions lines. So I want to use multithread to speed up my script. However, it seems that the multithread one ran even much slower than the single thread script. I know that there should be some problems in my script, but I don't know where they are. Hope you could help me. Thanks very much. Following is my script:
#!/usr/bin/perl -w use strict; use warnings; use threads; use threads::shared; use Statistics::R; # using R to do t.test if(@ARGV != 3) { print STDERR "Usage: test_mt.pl input_list1 input_list2 output\n"; exit(0); } my ($inf1, $inf2, $outf)=@ARGV; my @inf1=`ls $inf1`; # get all of files in list1 my @inf2=`ls $inf2`; # get all of files in list2 my %hash :shared; sub read_file{ my $inf=shift; $hash{$inf} = &share({}); open(IN, $inf) or die "cannot open $inf\n"; while(<IN>){ if($_=~/\w/){ chomp; my @info=split(/\t/, $_); # lock(%hash); $hash{$inf}{$info[0]."_".$info[1]}=$info[2]; } } close IN; } my @threads; my $thread_count=0; for(my $i=0; $i<=$#inf1; $i++){ my $t = threads->create(\&read_file, $inf1[$i]); push(@threads, $t); $thread_count++; } for(my $i=0; $i<=$#inf2; $i++){ my $t = threads->create(\&read_file, $inf2[$i]); push(@threads, $t); $thread_count++; } print STDERR "Total threads to read the files: $thread_count\n"; $_->join foreach @threads; sleep 1; open(OUT, ">$outf") or die "cannot open $outf\n"; # Create a communication bridge with R and start R my $R = Statistics::R->new(); ### below is to do some R related calculation based on the hash; the +problem should exist above :) open(IN, $inf1[0]) or die "cannot open $inf1[0]\n"; while(<IN>){ if($_=~/\w/){ my @info=split(/\t/, $_); my $cpg_score; my $total1; my $total2; my @list1; my @list2; foreach my $sample (@inf1){ $cpg_score.="\t".$hash{$sample}{$info[0]."_".$info[1]}; $total1+=$hash{$sample}{$info[0]."_".$info[1]}; push(@list1, $hash{$sample}{$info[0]."_".$info[1]}); } foreach my $sample (@inf2){ $cpg_score.="\t".$hash{$sample}{$info[0]."_".$info[1]}; $total2+=$hash{$sample}{$info[0]."_".$info[1]}; push(@list2, $hash{$sample}{$info[0]."_".$info[1]}); } if(abs($total1/@sample1 - $total2/@sample2)>=0.2){ my $mean1=sprintf("%.2f", $total1/@inf1); my $mean2=sprintf("%.2f", $total2/@inf2); my $list1=join",", @list1; my $list2=join",", @list2; ### Run R commands $R->run(qq`x <- t.test(c($list1), c($list2))`); my $p_value= $R -> get('x$p.value'); print OUT "$info[0]\t$info[1]$cpg_score\t$mean1\t$mean2\t$p_ +value\n"; } } } $R->stop(); close IN; close OUT;

In reply to problem of my multithreading perl script by qingfengzealot

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.