Re: Reading from file in threaded code is slow

Thanks for all your comments. I am sorry I was too terse in my description. Allow me to explain a bit more.

Firstly, my production code is CPU bound; the example above is not. My single threaded production code on my 4 core box runs at 100% of one core. I can run the program four times and get the CPU up to about 70-80% over all 4 cores but if one program has a fault I need to kill the other three programs and manually tidy up before I can restart. I thought a single program running four threads might be a little bit faster, but much cleaner to manage when things go wrong.

As I said the example is just intended to show a problem I have run in to. I was expecting the output of the example program to show the processing times for the individual files to go up to say 3 or 4 seconds, and the overall execution time of the three *_process calls to remain about the same. Instead the thread_process is taking 20 times longer to read the same 24MB while the CPU remains idle.

If you want to try this at home, the following code can create suitably named and sized files suitable for testing.

#!/usr/bin/perl
# - Creating some sample 4MB files for testing threads.
use strict;
use warnings;

my @files = qw( 
    SyslogR_20110615_134243.txt SyslogR_20110615_162146.txt 
    SyslogR_20110620_090237.txt SyslogR_20110620_090308.txt 
    SyslogR_20110620_092240.txt SyslogR_20110620_092328.txt
);

for my $file (@files) {
    open my $OUT, ">" , $file or die "Cannot write to $file : $!";
    my $line = "123456789 "x 25;
    for (my $i= 0; $i< 4e6/250; $i++ ) {
        print $OUT $i, " ", $line, "\n";
    }
    close $OUT;
}
[download]

Comment on Re: Reading from file in threaded code is slow Download Code

Replies are listed 'Best First'.
Re^2: Reading from file in threaded code is slow by BrowserUk (Patriarch) on Jul 09, 2011 at 01:00 UTC
Firstly, my production code is CPU bound; the example above is not. My single threaded production code on my 4 core box runs at 100% of one core. Really? Cos when I run a single process reading 10 x 4MB files each consisting of 16,000 lines on my 4-core system, it reads and counts all 40MB/160,000 lines in ~1/3 of a second: c:\test>dir junk?.dat Volume in drive C has no label. Volume Serial Number is 8C78-4B42 Directory of c:\test 09/07/2011 01:39 4,032,000 junk0.dat 09/07/2011 01:39 4,032,000 junk1.dat 09/07/2011 01:39 4,032,000 junk2.dat 09/07/2011 01:39 4,032,000 junk3.dat 09/07/2011 01:39 4,032,000 junk4.dat 09/07/2011 01:39 4,032,000 junk5.dat 09/07/2011 01:39 4,032,000 junk6.dat 09/07/2011 01:39 4,032,000 junk7.dat 09/07/2011 01:39 4,032,000 junk8.dat 09/07/2011 01:39 4,032,000 junk9.dat 10 File(s) 40,320,000 bytes 0 Dir(s) 205,414,293,504 bytes free c:\test>head junk0.dat 1234567890123456789012345678901234567890123456789012345678901234567890 +123456789012345678901234567890123456789012345678901234567890123456789 +012345678901234567890123456789012345678901234567890123456789012345678 +901234567890123456789012345678901234567890 1234567890123456789012345678901234567890123456789012345678901234567890 +123456789012345678901234567890123456789012345678901234567890123456789 +012345678901234567890123456789012345678901234567890123456789012345678 +901234567890123456789012345678901234567890 1234567890123456789012345678901234567890123456789012345678901234567890 +123456789012345678901234567890123456789012345678901234567890123456789 +012345678901234567890123456789012345678901234567890123456789012345678 +901234567890123456789012345678901234567890 1234567890123456789012345678901234567890123456789012345678901234567890 +123456789012345678901234567890123456789012345678901234567890123456789 +012345678901234567890123456789012345678901234567890123456789012345678 +901234567890123456789012345678901234567890 1234567890123456789012345678901234567890123456789012345678901234567890 +123456789012345678901234567890123456789012345678901234567890123456789 +012345678901234567890123456789012345678901234567890123456789012345678 +901234567890123456789012345678901234567890 1234567890123456789012345678901234567890123456789012345678901234567890 +123456789012345678901234567890123456789012345678901234567890123456789 +012345678901234567890123456789012345678901234567890123456789012345678 +901234567890123456789012345678901234567890 1234567890123456789012345678901234567890123456789012345678901234567890 +123456789012345678901234567890123456789012345678901234567890123456789 +012345678901234567890123456789012345678901234567890123456789012345678 +901234567890123456789012345678901234567890 1234567890123456789012345678901234567890123456789012345678901234567890 +123456789012345678901234567890123456789012345678901234567890123456789 +012345678901234567890123456789012345678901234567890123456789012345678 +901234567890123456789012345678901234567890 1234567890123456789012345678901234567890123456789012345678901234567890 +123456789012345678901234567890123456789012345678901234567890123456789 +012345678901234567890123456789012345678901234567890123456789012345678 +901234567890123456789012345678901234567890 1234567890123456789012345678901234567890123456789012345678901234567890 +123456789012345678901234567890123456789012345678901234567890123456789 +012345678901234567890123456789012345678901234567890123456789012345678 +901234567890123456789012345678901234567890 c:\test>wc -l junk0.dat 16000 junk0.dat c:\test>perl -MTime::HiRes=time -e"BEGIN{ @ARGV = map glob, @ARGV; print time}" -nle"$n++ }{ print $n, ' ', time" junk?.dat 1310172599.189 160000 1310172599.43053 [download] And the cpu usage is 2.7% for less than 1/2 a second. It requires more than that to start a single thread. So, if you are going to make a case for threading this insignificant piece of code, you are going to have to show something far more convincing than you have so far. Seeing is believing. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]

Replies are listed 'Best First'.

Re^2: Reading from file in threaded code is slow
by BrowserUk (Patriarch) on Jul 09, 2011 at 01:00 UTC

Firstly, my production code is CPU bound; the example above is not. My single threaded production code on my 4 core box runs at 100% of one core.

Really? Cos when I run a single process reading 10 x 4MB files each consisting of 16,000 lines on my 4-core system, it reads and counts all 40MB/160,000 lines in ~1/3 of a second:

c:\test>dir junk?.dat
 Volume in drive C has no label.
 Volume Serial Number is 8C78-4B42

 Directory of c:\test

09/07/2011  01:39         4,032,000 junk0.dat
09/07/2011  01:39         4,032,000 junk1.dat
09/07/2011  01:39         4,032,000 junk2.dat
09/07/2011  01:39         4,032,000 junk3.dat
09/07/2011  01:39         4,032,000 junk4.dat
09/07/2011  01:39         4,032,000 junk5.dat
09/07/2011  01:39         4,032,000 junk6.dat
09/07/2011  01:39         4,032,000 junk7.dat
09/07/2011  01:39         4,032,000 junk8.dat
09/07/2011  01:39         4,032,000 junk9.dat
              10 File(s)     40,320,000 bytes
               0 Dir(s)  205,414,293,504 bytes free

c:\test>head junk0.dat
1234567890123456789012345678901234567890123456789012345678901234567890
+123456789012345678901234567890123456789012345678901234567890123456789
+012345678901234567890123456789012345678901234567890123456789012345678
+901234567890123456789012345678901234567890
1234567890123456789012345678901234567890123456789012345678901234567890
+123456789012345678901234567890123456789012345678901234567890123456789
+012345678901234567890123456789012345678901234567890123456789012345678
+901234567890123456789012345678901234567890
1234567890123456789012345678901234567890123456789012345678901234567890
+123456789012345678901234567890123456789012345678901234567890123456789
+012345678901234567890123456789012345678901234567890123456789012345678
+901234567890123456789012345678901234567890
1234567890123456789012345678901234567890123456789012345678901234567890
+123456789012345678901234567890123456789012345678901234567890123456789
+012345678901234567890123456789012345678901234567890123456789012345678
+901234567890123456789012345678901234567890
1234567890123456789012345678901234567890123456789012345678901234567890
+123456789012345678901234567890123456789012345678901234567890123456789
+012345678901234567890123456789012345678901234567890123456789012345678
+901234567890123456789012345678901234567890
1234567890123456789012345678901234567890123456789012345678901234567890
+123456789012345678901234567890123456789012345678901234567890123456789
+012345678901234567890123456789012345678901234567890123456789012345678
+901234567890123456789012345678901234567890
1234567890123456789012345678901234567890123456789012345678901234567890
+123456789012345678901234567890123456789012345678901234567890123456789
+012345678901234567890123456789012345678901234567890123456789012345678
+901234567890123456789012345678901234567890
1234567890123456789012345678901234567890123456789012345678901234567890
+123456789012345678901234567890123456789012345678901234567890123456789
+012345678901234567890123456789012345678901234567890123456789012345678
+901234567890123456789012345678901234567890
1234567890123456789012345678901234567890123456789012345678901234567890
+123456789012345678901234567890123456789012345678901234567890123456789
+012345678901234567890123456789012345678901234567890123456789012345678
+901234567890123456789012345678901234567890
1234567890123456789012345678901234567890123456789012345678901234567890
+123456789012345678901234567890123456789012345678901234567890123456789
+012345678901234567890123456789012345678901234567890123456789012345678
+901234567890123456789012345678901234567890

c:\test>wc -l junk0.dat
  16000 junk0.dat

c:\test>perl -MTime::HiRes=time 
-e"BEGIN{ @ARGV = map glob, @ARGV; print time}" 
-nle"$n++ }{ print $n, ' ', time" junk?.dat

1310172599.189
160000 1310172599.43053
[download]

And the cpu usage is 2.7% for less than 1/2 a second. It requires more than that to start a single thread.

So, if you are going to make a case for threading this insignificant piece of code, you are going to have to show something far more convincing than you have so far. Seeing is believing.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

[reply]
[d/l]