amcglinchy has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I have been trying to speed up a cpu-bound log file analysis program using threads. However the resulting program is even slower (5 times!) than my single threaded version. I have isolated the problem down to the following simple example. I must be doing something wrong. Any suggestions on how to get back to the sub second file reads?

For ref source files are about 4MB- 10MB with line lengths around 250bytes. I have only tested this on perl 5.10.1 in cygwin.

#!/usr/bin/perl # - Example threading code reading from multple files being slow use strict; use warnings; use threads; use Thread::Queue; my @files = qw( SyslogR_20110615_134243.txt SyslogR_20110615_162146.txt SyslogR_20110620_090237.txt SyslogR_20110620_090308.txt SyslogR_20110620_092240.txt SyslogR_20110620_092328.txt ); non_thread_process( \@files); my $PARALLEL = 1; thread_process( \@files); $PARALLEL = 4; thread_process( \@files); sub non_thread_process { my $args=shift; for my $file (@$args) { process($file, "Non Thread"); } } sub thread_process { my $args = shift; my @threads; my $queue = Thread::Queue->new(); for my $p ( 1 .. $PARALLEL ) { push @threads, threads->create( sub { while( my $file = $queue->dequeue) { process($file, "Thread $p of $PARALLEL"); }}) } $queue->enqueue( @$args, (undef) x $PARALLEL ); $_->join for @threads; return 1; } sub process { my ($file, $msg) = @_; my $start =time; open my $IN, $file or die "Cannot read file $file : $!"; while(<$IN>) { my @x= split/ /; # Do something } close $IN; my $end =time; printf "$msg: Process of $file complete in %ds\n", $end-$start; } __END__ $ perl -v This is perl, v5.10.1 (*) built for i686-cygwin-thread-multi-64int (with 13 registered patches, see perl -V for more detail) Copyright 1987-2009, Larry Wall Perl may be copied only under the terms of either the Artistic License + or the GNU General Public License, which may be found in the Perl 5 source ki +t. Complete documentation for Perl, including FAQ lists, should be found +on this system using "man perl" or "perldoc perl". If you have access to + the Internet, point your browser at http://www.perl.org/, the Perl Home Pa +ge. $ perl ./example.pl Non Thread: Process of SyslogR_20110615_134243.txt complete in 0s Non Thread: Process of SyslogR_20110615_162146.txt complete in 1s Non Thread: Process of SyslogR_20110620_090237.txt complete in 0s Non Thread: Process of SyslogR_20110620_090308.txt complete in 1s Non Thread: Process of SyslogR_20110620_092240.txt complete in 0s Non Thread: Process of SyslogR_20110620_092328.txt complete in 1s Thread 1 of 1: Process of SyslogR_20110615_134243.txt complete in 0s Thread 1 of 1: Process of SyslogR_20110615_162146.txt complete in 1s Thread 1 of 1: Process of SyslogR_20110620_090237.txt complete in 0s Thread 1 of 1: Process of SyslogR_20110620_090308.txt complete in 0s Thread 1 of 1: Process of SyslogR_20110620_092240.txt complete in 1s Thread 1 of 1: Process of SyslogR_20110620_092328.txt complete in 0s Thread 4 of 4: Process of SyslogR_20110620_090308.txt complete in 11s Thread 1 of 4: Process of SyslogR_20110615_134243.txt complete in 16s Thread 3 of 4: Process of SyslogR_20110620_090237.txt complete in 24s Thread 2 of 4: Process of SyslogR_20110615_162146.txt complete in 29s Thread 1 of 4: Process of SyslogR_20110620_092328.txt complete in 14s Thread 4 of 4: Process of SyslogR_20110620_092240.txt complete in 19s

Replies are listed 'Best First'.
Re: Reading from file in threaded code is slow
by BrowserUk (Patriarch) on Jul 08, 2011 at 18:42 UTC
    I have been trying to speed up a cpu-bound log file analysis program using threads.... I have isolated the problem down to the following simple example.

    You claim your processing is cpu-bound, but your example does nothing except split the lines, which will never take longer than reading it from disc.

    I'm going to assume that $PARALLEL = 4; means you have a 4-core system. Well done for limiting yourself to 4 threads.

    But, I'm also going to assume that you have only one disk. Trying to read from four separate files (on the same disk) concurrently, will always be slower than reading those same four files sequentially. That's true regardless of whether you use 4 threads or 4 processes or even 4 entirely different computers.

    Here's why. Disks have only one read head,and the majority of the cost of disk IO is moving that head to the right place on the disk in order to read the next chunk of data.

    When you read the four files sequentially, many chunks of each file will be located contiguously on the same track of the disk, which means that many chunks of the file can be read without moving the disk head. Often, in a single rotation of the disk.

    But when you are reading four files concurrently, the disk head will need to be dancing back and forth across the disk in order to read each new chunk. Those continuous and repetitive head movements absolutely kill throughput.

    If your processing was truly cpu-bound -- ie. if processing a record took significantly longer than it does to read it, and simply spliting it certainly does not -- then there is some scope for improving throughput through threading.

    (Better; more performant) Alternatives include:

    • Distributing the files across multiple drives and using one thread per drive.
    • Using hardware raided drives with intelligent controllers.
    • Place the files on SSDs -- though this will still favour serial rather than parallel access if all the files live on the same unit.

    Assessment get more complicated if you have NAS or SAN drives using raided SSDs.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Reading from file in threaded code is slow
by Tanktalus (Canon) on Jul 08, 2011 at 17:36 UTC

    Two words stand out in your post: CPU-bound and threads. So I have to ask: are you running this on a 4(or more)-CPU system? If not, you're not going to be able to see any improvement. If you have a good mix of CPU and I/O-bound, you could, as the CPU can be doing stuff while the hard disk is reading stuff in, but I'd expect that the optimal number of threads here might be only 2, in a single-CPU system, not 4. I notice, for example, just with your sample code, that if I run with more threads than I have cores, the speed dramatically drops. Mind you, I don't have any huge log files to use against it, so it's still fast.

Re: Reading from file in threaded code is slow
by amcglinchy (Novice) on Jul 09, 2011 at 01:10 UTC

    Further to this I have had a bit of a play on a few more perls and OSes.

    The code runs quickly on

    • This is perl 5, version 12, subversion 3 (v5.12.3) built for MSWin32-x86-multi-thread
    • This is perl, v5.8.8 built for i386-linux-thread-multi

    On all of these the code runs with about 1sec/file total 4seconds (although 5.8.8 spat out some warnings "Scalars leaked: 1" ).

    The code runs slowly on

    • This is perl, v5.10.0 built for cygwin-thread-multi-64int
    • This is perl, v5.10.1 built for cygwin-thread-multi-64int
    • This is perl 5, version 14, subversion 1 (v5.14.1) built for cygwin-thread-multi-64int

    The 5.10.x taking 7 to 17 s/file, while the shiny new 5.14.1 took 4 to 8s/file. So it looks like this is a problem with cygwin only. :-(

    It would be nice if somone else could confirm my observations. The next stop on Monday will be ActivePerl 5.14.1 but I would rather like to keep the code in cygwin if I could.

    Suggestions?

      So it looks like this is a problem with cygwin only.

      That makes some sense. When I briefly played with cygwin close to a decade ago I noted that it was very sluggish for many things. It wouldn't surprise me to hear that it has never been tailored for a multi-core, multi-threaded environment.

      You might get better responses by reporting your findings on the cygwin lists as this doesn't seem to be a "perl problem" per se.

Re: Reading from file in threaded code is slow
by Marshall (Canon) on Jul 08, 2011 at 20:30 UTC
    update...

    Ooops tried to retract post after discovering that I had mis-read what was happening but wasn't fast enough on the trigger.... Sorry I lost original wrong comment...but it was wrong anyway.

    Update Again: here was original post:

    ---
    I think that there is a flaw in your benchmark. Many file system resources are allocated on a per process basis, not per thread. It looks to me like your threaded version has a single process that's trying to open the same file 4 times and read from 4 different file positions? This can potentially cause big trouble and fighting.

    I suggest recoding the benchmark so that no file is open more than once in the same process. In this case, that means each thread needs to be doing an independent list of files.

    ---
    I was mistaken -- the OP's code already does what I suggested (the overall process never has the same file open twice - each thread is independent) - see BrowserUk's response..

      It looks to me like your threaded version has a single process that's trying to open the same file 4 times and read from 4 different file positions?

      In the routine thread_process(), the list of files is pushed onto a queue. Each of the threads then pull (different) filenames from that queue:

      sub thread_process { my $args = shift; my @threads; my $queue = Thread::Queue->new(); for my $p ( 1 .. $PARALLEL ) { push @threads, threads->create( sub { while( my $file = $queue->dequeue) { process($file, "Thread $p of $PARALLEL"); }}) } $queue->enqueue( @$args, (undef) x $PARALLEL ); $_->join for @threads; return 1; }

      So no. They are not reading from the same file.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Reading from file in threaded code is slow
by locked_user sundialsvc4 (Abbot) on Jul 08, 2011 at 20:45 UTC

    I am not surprised to see this.   If your program is fundamentally “reading from a file,” then it is by definition an I/O-bound task.   Adding threads will never help that situation, because, at the end of the day, you have a stream of disk-read requests coming in and a stream of disk-write requests going out, all to the same device.

    By adding threads to the mix, you make matters considerably worse because now the pattern of read-requests is scattered all over the platter.   Seek time, which is by far the longest component of disk-related delays, is now thrust into front-and-center ... unnecessarily.

    The best strategy is to open the file for sequential reading, and read the file sequentially.   Do the same with your output file.   Now, the operating system’s natural file-buffering heuristics take over.   The operating system will “read ahead” and “lazy write.”   (As long as you inform the OS that this is what you are doing, that is all you need to do.)

Re: Reading from file in threaded code is slow
by amcglinchy (Novice) on Jul 09, 2011 at 00:23 UTC

    Thanks for all your comments. I am sorry I was too terse in my description. Allow me to explain a bit more.

    Firstly, my production code is CPU bound; the example above is not. My single threaded production code on my 4 core box runs at 100% of one core. I can run the program four times and get the CPU up to about 70-80% over all 4 cores but if one program has a fault I need to kill the other three programs and manually tidy up before I can restart. I thought a single program running four threads might be a little bit faster, but much cleaner to manage when things go wrong.

    As I said the example is just intended to show a problem I have run in to. I was expecting the output of the example program to show the processing times for the individual files to go up to say 3 or 4 seconds, and the overall execution time of the three *_process calls to remain about the same. Instead the thread_process is taking 20 times longer to read the same 24MB while the CPU remains idle.

    If you want to try this at home, the following code can create suitably named and sized files suitable for testing.

    #!/usr/bin/perl # - Creating some sample 4MB files for testing threads. use strict; use warnings; my @files = qw( SyslogR_20110615_134243.txt SyslogR_20110615_162146.txt SyslogR_20110620_090237.txt SyslogR_20110620_090308.txt SyslogR_20110620_092240.txt SyslogR_20110620_092328.txt ); for my $file (@files) { open my $OUT, ">" , $file or die "Cannot write to $file : $!"; my $line = "123456789 "x 25; for (my $i= 0; $i< 4e6/250; $i++ ) { print $OUT $i, " ", $line, "\n"; } close $OUT; }
      Firstly, my production code is CPU bound; the example above is not. My single threaded production code on my 4 core box runs at 100% of one core.

      Really? Cos when I run a single process reading 10 x 4MB files each consisting of 16,000 lines on my 4-core system, it reads and counts all 40MB/160,000 lines in ~1/3 of a second:

      c:\test>dir junk?.dat Volume in drive C has no label. Volume Serial Number is 8C78-4B42 Directory of c:\test 09/07/2011 01:39 4,032,000 junk0.dat 09/07/2011 01:39 4,032,000 junk1.dat 09/07/2011 01:39 4,032,000 junk2.dat 09/07/2011 01:39 4,032,000 junk3.dat 09/07/2011 01:39 4,032,000 junk4.dat 09/07/2011 01:39 4,032,000 junk5.dat 09/07/2011 01:39 4,032,000 junk6.dat 09/07/2011 01:39 4,032,000 junk7.dat 09/07/2011 01:39 4,032,000 junk8.dat 09/07/2011 01:39 4,032,000 junk9.dat 10 File(s) 40,320,000 bytes 0 Dir(s) 205,414,293,504 bytes free c:\test>head junk0.dat 1234567890123456789012345678901234567890123456789012345678901234567890 +123456789012345678901234567890123456789012345678901234567890123456789 +012345678901234567890123456789012345678901234567890123456789012345678 +901234567890123456789012345678901234567890 1234567890123456789012345678901234567890123456789012345678901234567890 +123456789012345678901234567890123456789012345678901234567890123456789 +012345678901234567890123456789012345678901234567890123456789012345678 +901234567890123456789012345678901234567890 1234567890123456789012345678901234567890123456789012345678901234567890 +123456789012345678901234567890123456789012345678901234567890123456789 +012345678901234567890123456789012345678901234567890123456789012345678 +901234567890123456789012345678901234567890 1234567890123456789012345678901234567890123456789012345678901234567890 +123456789012345678901234567890123456789012345678901234567890123456789 +012345678901234567890123456789012345678901234567890123456789012345678 +901234567890123456789012345678901234567890 1234567890123456789012345678901234567890123456789012345678901234567890 +123456789012345678901234567890123456789012345678901234567890123456789 +012345678901234567890123456789012345678901234567890123456789012345678 +901234567890123456789012345678901234567890 1234567890123456789012345678901234567890123456789012345678901234567890 +123456789012345678901234567890123456789012345678901234567890123456789 +012345678901234567890123456789012345678901234567890123456789012345678 +901234567890123456789012345678901234567890 1234567890123456789012345678901234567890123456789012345678901234567890 +123456789012345678901234567890123456789012345678901234567890123456789 +012345678901234567890123456789012345678901234567890123456789012345678 +901234567890123456789012345678901234567890 1234567890123456789012345678901234567890123456789012345678901234567890 +123456789012345678901234567890123456789012345678901234567890123456789 +012345678901234567890123456789012345678901234567890123456789012345678 +901234567890123456789012345678901234567890 1234567890123456789012345678901234567890123456789012345678901234567890 +123456789012345678901234567890123456789012345678901234567890123456789 +012345678901234567890123456789012345678901234567890123456789012345678 +901234567890123456789012345678901234567890 1234567890123456789012345678901234567890123456789012345678901234567890 +123456789012345678901234567890123456789012345678901234567890123456789 +012345678901234567890123456789012345678901234567890123456789012345678 +901234567890123456789012345678901234567890 c:\test>wc -l junk0.dat 16000 junk0.dat c:\test>perl -MTime::HiRes=time -e"BEGIN{ @ARGV = map glob, @ARGV; print time}" -nle"$n++ }{ print $n, ' ', time" junk?.dat 1310172599.189 160000 1310172599.43053

      And the cpu usage is 2.7% for less than 1/2 a second. It requires more than that to start a single thread.

      So, if you are going to make a case for threading this insignificant piece of code, you are going to have to show something far more convincing than you have so far. Seeing is believing.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Reading from file in threaded code is slow
by Anonymous Monk on Jul 09, 2011 at 04:25 UTC

    As other monks have shown, the primary problem with your code is disk contention. You also say that your code is cpu bound. What I would do is handle all file IO in one thread and hand the data off to multiple threads to process.

Re: Reading from file in threaded code is slow
by amcglinchy (Novice) on Jul 11, 2011 at 00:11 UTC

    It turns out this defect has nothing to do with file I/O at all.

    The following code can reproduce the effect. I have raised a bug ticket with perlbug and the compiler of perl for cygwin.

    ## Test code to show split slow when used in muiltiple threads is slow + in cygwin perl use strict; use warnings; use threads; use Thread::Queue; use Time::HiRes(); my $string = "12356789 "x25; my $TASK_COUNT =10; print( (`$^X -v`)[1]); thread_process(1); thread_process(4); sub thread_process { my $parallel = shift; my $start = Time::HiRes::time(); my @threads; my $queue = Thread::Queue->new(); for my $p ( 1 .. $parallel ) { push @threads, threads->create( sub { while( my $task_no = $queue->dequeue) { for my $n (1..10_000) { my @x= split / /, $string; # Split } }}) } $queue->enqueue( 1 .. $TASK_COUNT , (undef) x $parallel ); $_->join for @threads; my $end= Time::HiRes::time(); printf "Processing %d tasks in %d threads completed in %fsecs\n", +$TASK_COUNT, $parallel, $end-$start; } __END__ # Here is an example of the output on my dual core XP machine $ ~/localperl/bin/perl.exe example.split.pl This is perl 5, version 14, subversion 1 (v5.14.1) built for cygwin-th +read-multi-64int Processing 10 tasks in 1 threads completed in 2.640625secs Processing 10 tasks in 4 threads completed in 12.468637secs $ perl example.split.pl This is perl, v5.10.0 built for cygwin-thread-multi-64int Processing 10 tasks in 1 threads completed in 4.578125secs Processing 10 tasks in 4 threads completed in 25.437345secs # The defect does not occur with ActivePerl. Instead the expected slig +htly faster result occurs $ c:/perl/bin/perl example.split.pl This is perl 5, version 12, subversion 3 (v5.12.3) built for MSWin32-x +86-multi-thread Processing 10 tasks in 1 threads completed in 2.062500secs Processing 10 tasks in 4 threads completed in 1.671790secs