in reply to BASH vs Perl performance

If your statement that most of the time is in step 3 is true, I disagree with the other answers; you can expect to see a fair improvement renaming with perl unless Sun E-250 has lightening fast fork & exec.

Keep in mind that there's no reason not to mix bash and perl for different parts.

Might be helpful for you to show that part of your bash script.

Replies are listed 'Best First'.
Re^2: BASH vs Perl performance
by jcoxen (Deacon) on Aug 10, 2004 at 21:01 UTC
    Here's the busy part of the code. I've left out the preliminay setup and the final smbclient copy commands.
    cd RAR echo "Getting RAR files from North Server" wget -q ftp://username:password@xxx.xxx.xxx.xxx/*.rar;type=i echo "Getting RAR files from South Server" wget -q ftp://username:password@xxx.xxx.xxx.xxx/*.rar;type=i echo "Unpacking RAR files" unrar -o+ -inul e *raw echo "Removing .rar files" rm *.rar echo "Renaming files and moving them to directories by type" for src in *; do type=$(echo $src | sed -e "s/^.*_//" | sed -e "s/.report//") tgt=$(echo $src | sed -e "s/\(^.*\)\.\(.*_.*\)/\2/") echo "Moving $src to ../$type/$tgt" mv $src ../$type/$tgt done cd ../CRS echo "Processing CRS files" for src in *.report; do # Set some variables type=$(echo $src | sed -e "s/^........//" | sed -e "s/\(^...\).*/\ +1/") dest=$(echo $src | sed -e "s/report/csv/") tid=$(echo $src | sed -e "s/_.*$//") echo "Src= $src" echo "Type=$type" echo "Dest=$dest" echo "TID= $tid" echo "" # Check to see if this is a Flashwave if [ $type = "FOS" ] \ || [ $type = "FOT" ] \ || [ $type = "FOU" ] ; then # If it is a Flashwave... echo "This is a flashwave" cat $src | sed -e "/FILL,0,$/d" > /tmp/sedtemp else # If this is NOT a Flashwave... echo "This is NOT a flashwave" cat $src | sed -e "/FILL,0,$/d" | sed -n "/,[1-2][,-].*,$/ { h N s/^.*,\(.*\),$/\1/ H x s/\n//g p } /,[0-9]\{1,2\}.*,$/ p" > /tmp/sedtemp fi # Find each Port ID section and duplicate it using , instead of - cat /tmp/sedtemp | uniq | # Special case - no dash in Port ID, just a single number #sed -e 's/\(,[0-9]\{1,2\}\)$/\1,,,,,/' | # Cleanup caused by special case #sed -e 's/,,,,,\([0-9]\{1,2\}\)$/,,,,\1,,,,,/' | # Special case - no dash in Port ID, just a single number sed -e 's/\(,[0-9]\{1,2\}\),$/\1,,,,,/' | # Main Port ID reformat sed -e 's/,\([0-9]\{0,2\}\)-\([0-9]\{0,2\}\)-\{,1\}\([0-9]\{0,2\}\ +)-\{,1\}\([0-9]\{0,2\}\)-\{,1\}\([0-9]\{0,2\}\)/,\1-\2-\3-\4-\ 5-,\1,\2,\3,\4,\5/g' | # Delete mulitple dashes sed -e 's/--/-/g' | # Do it again just to make sure sed -e 's/--/-/g' | # Delete trailing dashes sed -e 's/-,/,/g' > $dest # Update the tidlist echo "Updating tidlist" echo $tid >> ../tidlist.txt echo "" done
      Things like this:
      tid=$(echo $src | sed -e "s/_.*$//")
      I would change to this:
      tid=${src%%_*}
      Often the 'echo ... | sed ...' lines can be replaced with shell parameter expansion, which can speed up some scripts. Overall though, I don't know if it'll do much for you. Your big sed pipe could be put into one sed command, and your deletion of multiple dashes looks wrong (especially since you do it twice), do you want this?: s/--*/-/g. And as merlyn might point out, you have a few useless uses of cat. You could use either input redirection or specify the file on the first command. In keeping with your current style, this works in ksh, I don't know about bash:
      <file \ sed s/this/that/g
      There are cases when sed is faster than perl, and the other way around. Last time I compared, it seemed that when I used alot of character classes (e.g., [0-9], etc.) perl tended to be faster. Update: or maybe when I could replace things like [0-9] in sed with \d in perl, and needed case insensitive matches, which you can't do with the old standard sed.