fletcher_the_dog has asked for the wisdom of the Perl Monks concerning the following question:

I am running a perl script from a linux machine over some files on a file server using a Samba mount. When I run this script over 60,000 files on my local machine it takes about 6 minutes. When I run it over the files on the Samba mount it takes over two hours. I thought there might be something wrong with the Samba mount but other non-perl programs experience no slow downs. Does anyone have any idea would perl would have this problem? This isn't the first time I have had this problem and I really need to know how to get around it.

Replies are listed 'Best First'.
Re: Perl with a Samba mount
by Beatnik (Parson) on Jan 03, 2003 at 15:10 UTC
    IMHO you'll always come across performance bumps when using network drives. How can you tell for sure that non-perl programs don't have that problem? Are they accessing those 60,000 files as well? For Perl, a file is a file, no creed or col.. uhm no extension or filesystem makes really a difference. That's all stuff the underlying OS will have to do.

    Ofcourse, chances are someone with p5p will now smack me right in the face and deny every little detail of my node :) but that's how it makes most sense to me.

    Greetz
    Beatnik
    ... Quidquid perl dictum sit, altum viditur.
      "IMHO you'll always come across performance bumps when using network drives. How can you tell for sure that non-perl programs don't have that problem? Are they accessing those 60,000 files as well? "
      Yes, in fact they run over the exact same 60,000 files. There is a slight performance bump of ~30 seconds, from 13m14s to 13m35s. Compare this to a 2+ hour increase.
        I assume those non-perl programs are binaries? In some recent testing I did in image manipulation, a pure C solution linked to libgd was about 50 times faster than a perl solution linked to GD.pm. Perl scripts, by nature, are slower than binary applications. Perl scripts need to be interpreted. There are entire layers of logic between how you program it and how the system eventually executes the code eventually generated by the underlying endgine. It all depends on how your code is written :) Maybe it's time to grab Devel::DProf and have a field day tweaking :)

        Greetz
        Beatnik
        ... Quidquid perl dictum sit, altum viditur.
Re: Perl with a Samba mount
by waswas-fng (Curate) on Jan 03, 2003 at 19:04 UTC
    When you say: When I run this script over 60,000 files on my local machine it takes about 6 minutes. When I run it over the files on the Samba mount it takes over two hours. What are you doing to these files? are you just looking at file names? are you stating multiple times for size mod date etc? are you slurping the data? Show us some code. Off the top of my head, I can think of a few things that can go wrong here, for instance grabbing multiple items such as mod time size etc can make multiple requests over the network for each file unless you do it correctly If you are slurping files to do proccessing of them *welp* the network is going to be way slower than a local hard disk. Give me more info on what you are doing and I can give you some pointers.

    -Waswas
      I am slurping the files in, rearranging some stuff and then writing them back out. Sorry I can't really give more details than that. Through some experimentation I have found that writing thing back out is where the big slow down is. I have found that if I "buffer" my write outs by writing out a bunch of files at a time it speeds things up ~5 percent. I know that it is the network that is slowing things down. I am wondering if there are any work arounds to make it faster over the network or make my read and writes over the network in bigger blocks so I don't have make as many network calls.
        in short:

        read and write in 32k or 64k blocks (to try to get SMB to go into burst mode).
        stat all needed stat vars in one call.

        Even with this your bottleneck is the network -- CIFS is not the fastest network protocol under the sun...

        -Waswas
Re: Perl with a Samba mount
by Fletch (Bishop) on Jan 03, 2003 at 17:18 UTC

    In addition to profiling from the perl side, consider using strace -t to get a timestamped picture of what system calls are being made when and see if any latencies are readily apparent there (watch open() and stat() calls in particular).

Re: Perl with a Samba mount
by tachyon (Chancellor) on Jan 04, 2003 at 11:13 UTC

    If the network is the issue why not just run the script locally on the Win32 box rather than via the Samba mount?

    Given that you are doing intesive file I/O I don't understand why you think that Perl operations using local disk access (probably running at at least 20-30 MegaBytes/sec) should not be a hell of a lot faster than over your network (probably 20-30 MegaBits/sec in the real world over 100 base T). In fact I would be AMAZED if it did not take at least 10 times longer over the network - it is simply a function of data transfer rates.

    As has been noted by others without code this is about all I can offer.

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

      Here is some code. Both the input and output directories are on the samba mount. Functions 1 and 2 do some string manipulation, but don't touch the file system. The $list_file is a file that starts with a path to a directory and then is a list of files. Thank you for your input.
      my $list_file=$ARGV[0]; my $outdir=$ARGV[1]; my $file_name; my $text; open LIST,$list_file; chomp(my $path=<LIST>); while($file_name=<LIST>) { chomp $file_name; local $/=undef; open INFILE,catfile($path,$file_name); $text=<INFILE> close INFILE; my $string=function1($text); function2($text); my $bindex=index($text,"<XML_TAG>"); $text=substr($text,0,$bindex). "<XML_TAG>\n". $string."\n". "</XML_TAG>\n". substr($text,$bindex); open OUT,'>'.catfile($outdir,$file_name); print OUT $text; close OUT; }