in reply to Creating very long strings from text files (DNA sequences)

Perhaps your builds of perl 5.6 and 5.8 were done with different compilers. Look at the ouput from perl -V from both versions of perl and look for differences.

If you are not able to build perl, you can probably get someone to try your exact code and data on another platform, such as linux. Devel::SmallProf can help track down the differences.

You may also be running into some sort of system problem such as disk fragmentation.

It should work perfectly the first time! - toma

Replies are listed 'Best First'.
Re^2: Creating very long strings from text files (DNA sequences)
by bobychan (Initiate) on Jun 26, 2004 at 19:32 UTC
    Hi Toma, I'm using ActivePerl, on a Windows XP desktop - Athlon AMD 2500+ with a 7200 RPM 60 GB hard drive and 512 MB PC2700 RAM (that's the one taking a long time with 5.8) versus a Windows XP laptop - Athlon AMD Mobile 2500+ with a 4200 RPM 20 GB hard drive and 768 MB PC2100 (5.6)
    This is perl, v5.6.1 built for MSWin32-x86-multi-thread Binary build 635 provided by ActiveState Corp. Built 15:34:21 Feb 4 2003 This is perl, v5.8.4 built for MSWin32-x86-multi-thread Binary build 810 provided by ActiveState Corp. Built Jun 1 2004 11:52:21

    perl -V shows that optimization flags are the same for both binary builds...

      I tried the code in your original post on my linux box using perl 5.6.1 and 5.8.0. For the same fruitfly file that BrowserUK mentioned, the times were 7.9 seconds and 21.2 seconds. So I am seeing a perl 5.8 slowdown of about 2.5 times, not the 10 times that you are seeing. However, my 5.8.0 version was built with a later C compiler and uses more optimization. My machine is a 2.5GHz Pentium.

      It would be interesting to verify that ActiveState perl is so much slower in 5.8 on the exact same machine and configuration.

      Perhaps someone knows of a perl for windows that is faster than the one from ActiveState?

      UPDATE:
      Here is some code that is a bit faster, about 13 seconds in perl5.8 on my machine, and 5 seconds in perl5.6. I think this code works, but it will need more testing to be sure. The basic idea is to work on the whole sequence when possible, such as when removing whitespace. Also, don't use the hash for intermediate results. Instead, use a scratch variable and store it in the hash when the result is complete.

      Doing this makes the logic a bit twisted, but you can probably straighten it out with some more thought.

      while (<SEQS>) { chomp; if (/\>\s*(.+)$/) { if ($seq_name ne '') { $seqs=~ s/\s//g; $seqs_hash{$seq_name} = $seqs; $seqs=''; } $seq_name = $1; } else { $seqs .= $_; } } $seqs=~ s/\s//g; $seqs_hash{$seq_name} = $seqs if ($seq_name ne '');
      It should work perfectly the first time! - toma

      Perl 5.8.x is slower that perl 5.6.x at string manipulation.

      That's old news. If you only wish to complain about that fair enough, but it is unlikely to change anything.

      You do have the option to follow the 5.6.2 upgrade path, provided you can live without the new functionality that is in 5.8.x. But if you need that functionality, then you have to appreciate that it comes with a cost.

      There are at least 2 replies in this thread that offer practical steps for mitigating your problem?


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon