Hey everyone, I am trying to rename a very large dataset, approx. 260,000. The script below, which as I am no perl expert is possibly quite amature, performs the required task but is slow. I was wondering if anyone had any sugestions on how to increase the performance of this script. The reason to rename one file like another is that they are matching FASTA and QUALITY files for genetic sequencing reactions and it is preferable if they have the same name but under different directories. The FASTA files were renamed using information contianed inside the file itself but this is not possible with the quality files. the usage of the script is  ./qual_renamer.pl fasta ./QUAL qual Any ideas on how to increase the performance would be appreciated. thanks, wayne
#! /usr/bin/perl -w use strict; use Data::Dumper; my $fasta_file = $ARGV[0]; my $directory = $ARGV[1]; my $qual_file = $ARGV[2]; sub make_file_hash { my ($file) = @_; my %file_hash; open(FH, "< $file"); while (my $line = <FH>) { chomp($line); my @parts = split(/\//,$line); my $name = $parts[3]; if ($name =~ /(^.{3})([0-9]{4})(F|P|R)/) { my $key = $2 . "." . lc($3); $file_hash{$key} = $name; } elsif ($name =~ /(^.{2})([0-9]{8})(F|P|R)/) { my $key = $2 . "." . lc($3); $file_hash{$key} = $name; } else { print $name . " is crap\n"; } } return %file_hash; } sub rename { my ($file, $hash_ref, $dir) = @_; my %hash = %$hash_ref; #print Dumper(%hash); if ($file =~ /([0-9]{4,8})(.f|r|p)(.*)/) { my $key = $1 . $2; if (exists $hash{$key}) { my $filename = $hash{$key}; `mv $file $dir/$filename`; } else { print "no key $key\n"; } } } my %tmp_hash = &make_file_hash($fasta_file); open(FH2, "< $qual_file"); while( my $file = <FH2>) { chomp($file); &rename($file, \%tmp_hash, $directory) }
./FASTA/LO-leaves_drought/LO45829460F ./FASTA/LO-leaves_drought/LO45815650F ./FASTA/LO-leaves_drought/LO45852136R ./FASTA/LO-leaves_drought/LO45987989F ./FASTA/LO-leaves_drought/LO45959830F ./FASTA/LO-leaves_drought/LO45982398F ./FASTA/LO-leaves_drought/LO45990585F ./FASTA/LO-leaves_drought/LO46021000F ./FASTA/LO-leaves_drought/LO45815528R ./FASTA/LO-leaves_drought/LO45994910F ./FASTA/LO-leaves_drought/LO45925816F ./FASTA/LO-leaves_drought/LO45938935F ./FASTA/LO-leaves_drought/LO45782339F ./FASTA/LO-leaves_drought/LO46006257F ./FASTA/LO-leaves_drought/LO46018733F ./FASTA/LO-leaves_drought/LO45815795F ./FASTA/LO-leaves_drought/LO46006601F ./FASTA/LO-leaves_drought/LO45893249F ./FASTA/LO-leaves_drought/LO45953120F ./FASTA/LO-leaves_drought/LO46053350F ./FASTA/LO-leaves_drought/LO45978413F ./FASTA/LO-leaves_drought/LO45866607F ./FASTA/LO-leaves_drought/LO46017397F
./QUAL/48743192.f_a09_1.fasta.qual ./QUAL/51455741.f_b20_1.fasta.qual ./QUAL/42595949.f_n02_1.fasta.qual ./QUAL/51293480.f_g04_2.fasta.qual ./QUAL/42188856.f_h02_2.fasta.qual ./QUAL/51477163.f_m12_2.fasta.qual ./QUAL/42219670.f_d02_1.fasta.qual ./QUAL/46911125.f_p06_1.fasta.qual ./QUAL/44656731.f_c24_1.fasta.qual ./QUAL/BNP3104.p.scf.qual ./QUAL/42063951.f_p22_1.fasta.qual ./QUAL/42939137.f_j20_1.fasta.qual ./QUAL/42824374.f_k16_2.fasta.qual ./QUAL/49426321.f_h08_2.fasta.qual ./QUAL/48869367.r_k07_2.fasta.qual ./QUAL/48637192.f_h15_1.fasta.qual ./QUAL/45574303.f_d17_1.fasta.qual ./QUAL/46189823.f_n05_2.fasta.qual ./QUAL/BNP0977.p.scf.qual ./QUAL/49408758.f_i10_1.fasta.qual ./QUAL/50040300.f_a20_1.fasta.qual ./QUAL/48690145.f_l05_2.fasta.qual ./QUAL/46242796.f_g23_1.fasta.qual

In reply to performance problems with renaming multiple files using another file name by mav3067

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.