thezog has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I'm still a Perl newbie and am writing an app using Gtk2 which needs to perform multiple file copies at the same time (1 souce, n destinations). I'm not having any problems with the actual copy function, but rather correctly spawning them in parallel. Threads if not detached or joined use all up all of the ram after a few runs, but detaching or joining them crash the app. The threads don't contain or need to touch any Gtk object. I guess threads isn't the way to go. I've never used POE, should I be looking at it? Any advice or direction would be fantastic as I've struggled for days with this issue. Using perl 5.8.8. Thanks!!

Replies are listed 'Best First'.
Re: Gtk2 w/ concurrent tasks
by jethro (Monsignor) on May 17, 2008 at 04:39 UTC
    First of all: Do you really need the copy to be in parallel? If the files are stored on the same hard disk it won't be faster doing the writes in parallel.

    Do you only copy the files or are you processing the data? Simple copying is best delegated to 'cp', i.e.  system("cp $from $to")
    If you process the data, do it line by line or chunck by chunk and write parallel to all the files. No need to start threads at all.

    If you still want to do it in parallel, you might just use fork(). See also http://www.perlmonks.org/?node_id=686182, a similar question just a few days ago.

    You might provide some more detail next time and preferably some relevant parts of your code. Otherwise the monks have to guess a lot and write answers to questions you never asked.

Re: Gtk2 w/ concurrent tasks
by mr_mischief (Monsignor) on May 17, 2008 at 10:37 UTC
    I'm guessing this is the same GTK2 app that's copying data that several monks discussed with you in the CB? If so, let me see if I can remember some more background to fill in the readers of this thread.

    You have one HD from which your copying, and you're copying to up to 49 others. The target drives are on seven seven-port USB hubs plugged into a single seven-port USB hub which is plugged into a single USB port on the PC-class machine. You're wanting to speed up the copying, but for some reason you're not going to be using IO::AIO as was suggested. You can't use dedicated drive duplication equipment because your software is supposed to be part of a package product with the machines and USB hubs.

    If that's about right, that background information may help the monks come up with useful suggestions sooner rather than spending a bunch of time figuring out your specific situation. It's always good to let the people helping you know if you're dealing with particular unusual circumstances and constraints so that they don't waste their time and can help you and other Seekers of Perl Wisdom better with less burnout.

Re: Gtk2 w/ concurrent tasks
by zentara (Cardinal) on May 17, 2008 at 13:32 UTC
    This is probably simpler advice, than the file gurus have given you, but here goes.....

    If you are filling your ram after a few runs, you are not running the threads properly. Crashes shouldn't happen if joined properly, but you may need to reuse your threads.

    Although it is probably the same idea as IO::AIO , you could open multiple pipes for writing with opens. This is untested, but should open 7 fh's in parallel. If you can get this to work, then you can integrate it into a Gtk2 program, with a single thread. 7 fh's will probably be faster that 7 threads, because the 7-threaded process will all share the same execution pointer since they are in the same pid. If you integrate it into Gtk2, you can setup Glib::IO->add_watch (fileno $fh, qw/out/, \&watch_callback, $fh); to watch for errors and completion of the writes. But if I were you, I would get it working from the commandline first, before adding gui complexities.

    It may be that to improve writing speed to each filehandle, you may need to spawn a piped open to the filelocation, and "cat the_file" to each location. That would relieve your Gtk2 app from the actual copying, by handing it off to cat. So instead of writing to the filhandles, you could fork and exec 7 times and exec "cat $infile > $outfile".

    #!/usr/bin/perl use warnings; use strict; my $infile = shift || $0; my %pids; for(1..7){ $pid{$_}{'pid'} = open( "$pid{$_}{'FH'} > $pid{$_}{'file_location'}" +) or warn "$!\n"; } open( IF,"< $infile" ) or die "$!\n"; while (<IF>){ my $line = $_; for(1..7){ print $pid{$_}{'FH'}, $line; } } print "done\n"; for(1..7){ close $pid{$_}{'FH'}; }

    I'm not really a human, but I play one on earth. Cogito ergo sum a bum
Re: Gtk2 w/ concurrent tasks
by jethro (Monsignor) on May 17, 2008 at 13:42 UTC
    If mr_mischief is correct, I would say the simple but effective system("cp $from $to &") might be possible. BUT:

    A single usb port on the pc has a transfer rate of about 40 Mbyte/s (source wikipedia). The slowest hard drives in hard drive tests in a PC magazine I could find had minimal 29 Mbyte/s write transfer rate, including notebook drives. Only Solid State disks got lower, the slowest was down to 17 Mbyte/s.

    Now these were recent tests, older hard drives would be somewhat slower. But if you didn't especially look for slow drives yours probably will be faster than this minimum

    So without buying an internal PCI USB card with a few ports the speedup you get is minimal to non-existant.

Re: Gtk2 w/ concurrent tasks
by BrowserUk (Patriarch) on May 18, 2008 at 05:39 UTC

    First, if you haven't already, please produce a simple testcase demonstrating the failure and raise a bug report against threads. If the testcase demonstrates the problem with the latest cpan version of that module in conjunction with 5.10 so much the greater chance of getting some action on it.

    Next a few possibilities.

    • If your threading design uses a pool approach and you can afford to leave your threads dormant after they finished their useful lives, then neither detach nor join them, but let lie fallow until the GUI is closed.

      Then use POSIX::_exit to bypass the global destruction phase and your code will end cleanly (and more quickly).

    • If that's not acceptable, then separate the GUI from the threaded code.

      Have the gui run as a separate script that starts the threaded code via piped open

      use GTK2; ... my $pid = open my $kid, "threadScript |" or die $!; ...

      The kid can provide trace output to its stdout and the GUI can use the GTK2 equivalent of Tk::Fileevent to track and display that status.

    • Revert to 5.8.6 and the version of threads that was distributed with it. It was possible, though it required care, to make GTK2 and threads coexist with that combination. The secret was in ensuring that all your non-GTK2 threads were spawned before you require'd (not use'd), any GTK modules.

      Unfortunately, despite all the good stuff the dual-lifing of threads has brought, the correction of this type of memory pool management problem that used to be routinely tracked down and corrected by the wizards on p5p, is now attributed to the threads module and so tends to slip under the radar of those experts.

    For more useful help than this, you would need to post some code showing what you are doing.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      I spent a few days trying various things including Thread::Pool which although the workers were shown to finish and get removed from the pool the memory usage continued to grow with every copy job. A $pool->join statement crashed the app just like any other thread join. I ended up going with the separate script which handles the threads and copies. It works great! Now I just need to finish getting the IPC::Open2 stuff working so I can retrieve the success/failure results back from the copy processes, umount the drives, notify the user, etc. It's sooo nice to finally move past this problem and close in on getting the application completed. Thanks for all the suggestions and help everyone!
        I ended up going with the separate script which handles the threads and copies. It works great!

        Glad to have helped. Perhaps you could return the favour.

        Did you produce a minimal testcase that demonstrated the problem and raise a bug report against threads? If not, could you please do so, because it won't get fixed unless they know the problem exists.

        Late addition

        I just went looking for the GTK equivalent of Tk::fileevent and noticed something that I don't recall seeing before. Maybe you missed it too? In the pod for the top-level GTK2 package which is generally just an index to the rest of the documentation, so you may well have skipped over it, there is this:

        use Gtk2 qw/-init -threads-init/;

        -threads-init

        Equivalent to Gtk2::Gdk::Threads->init, called to initialze/enable gdk's thread safety mechanisms so that gdk can be accessed from multiple threads when used in conjunction with Gtk2::Gdk::Threads->enter and Gtk2::Gdk::Threads->leave. If invoked as Gtk2::Gdk::Threads->init it should be done before Gtk2->init is called, if done by "use Gtk2 -init -threads-init" order does not matter.

        Which kinda sounds like it might be a solution to your original problem? Now you may be happy with your current route, but if you haven't already seen and tried this option, then it might be as well to, before raising the bug-report I encouraged you to do.

        I'd try this myself, but I remember installing GTK2 was non-trivial the last time I did it, and I then broke my Perl/GTK2 bindings by installing OCaml/GTK2 bindings. Since I don't use Perl/GTK2 for anything, I've never got around to trying fixing it, and don't really have the motivation to go through the pain again.

        Anyway, if you've tried it and it didn't help, please raise the bug-report. If you try it now and it fixes your problem, then a) it gives you choices; b) please report back here so that anyone coming along later can see the solution.

        End of late addition

        Now I just need to finish getting the IPC::Open2 stuff working so I can retrieve the success/failure results back from the copy processes, umount the drives, notify the user, etc.

        Do you need bi-directional comms after the child processes are running? Or just to pass some startup information on the spawn and retrieve status/results?

        If you can get away with the latter, then passing the startup info as command line arguments and retrieving ongoing status and results from stdout using a simple piped open

        push @kidPids, open $IOkids[ @IOkids ], qq[ theKidScript arg1 arg2 arg +3 |] or die ...; ...## Give the filehandle to GTKs equivalent of Tks fileevent api

        Might be easier than IPC::Open2 which I've had problems with in the past.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.