Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: faster with threads?

by Abigail-II (Bishop)
on Jun 09, 2004 at 11:01 UTC ( [id://362685]=note: print w/replies, xml ) Need Help??


in reply to Re: faster with threads?
in thread faster with threads?

Why do all that stuff with high and low marks, and yielding threads, if the OS can do it for you (and probably a lot more efficiently)? Just use three processes and pipes. Writing to a pipe whose buffer is full will block - causing the process to give up its timeslice - and so will reading from a pipe whose buffer is empty.

Abigail

Replies are listed 'Best First'.
Re^2: faster with threads?
by BrowserUk (Patriarch) on Jun 09, 2004 at 12:08 UTC

    The idea is to ensure that the cpu bound part of the process never has to wait for data and so uses as many timeslices available to it as possible.

    The water marks allow you to easily tailor the threading to maximise throughput.

    Using the queues makes it easy to have more than one thread processing the slow part(s) of the processing. Each thread is identical, you just start more of them. They all read their input from the same queue. You don't get this easy flexibility using pipes.

    If the processing of the data is the bottleneck you start two threads for that. If outputting to the DB is the bottleneck, have two threads doing that.

    If the DB is running in the same box (with 2 cpu's) then it will likely dominate one of them and all the threads will basically share the other. If the DB is on a different box, then the cpu-bound thread may dominate one process and the IO/DB threads share the other.

    The yielding should rarely come into play once you get the right watermark levels established, but it acts as a safeguard for the situations where either the IO or DB slows up--someone does a grep on the disk or hits the DB with a heavy query. It prevents the Q from filling memory whilst the processing at the other end is blocked.

    The reason I would try threads are:

    1. I'm more familiar with the threading model (forking is only threading under the covers, and without the control, where I live).
    2. I think that IPC through shared memory is more convenient and easier to program that through the flat stream of a pipe.
    3. You can share structured data using threads. I'm not yet certain if it is up to large scale production use, but it is much improved in 5.8.3.

    This final point is quite important with the OP's application. Basically he is reading lines, splitting them into chunks, and then throwing them into a DB. The DB IO is quite likely to be the slowest part of the overall processing.

    If having split the lines into chunks, he then has to serialise those chunks to pass them through a pipe to the DB process, he hasn't gained anything by splitting out the DB process.

    He would then have to deserialise it and the serialisation/deserialisation is likely to take much the same amount of time as the splitting, which negates the reason for having a separate process for the DB IO.

    I can't honestly say whether my thoughts would result in faster overall processing. There are too many factors involved. I don't have a dual processor machine to test on. There are many details that the OP hasn't supplied: where is the DB? How much indexing is on the DB? Is the DB shared with other applications? etc.

    Until someone actually tries some of this stuff using threads, nobody knows how it will stand up. Until recently, memory leaks prevented any worthwhile testing. With 5.8.3, that seems to be getting much better to the point where it is now worth trying stuff out again.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://362685]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (4)
As of 2024-04-24 03:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found