in reply to Making commond for large number of files

Although you CAN use simple greps to get things, I would advise the use of http://www.bioperl.org/wiki/HOWTO:SeqIO It understands the Standard Chromatogram Format (SCF) and can grab exactly what you will ever need.

use Bio::SeqIO; $in = new Bio::SeqIO -file => 'p122.gel', -format=>'SCF'; $seq = $in->next_seq(); # if you want to look at the various comment field data: while ( ($key, $val) = each %{$seq->names()} ) { print "$key\t$val\n"; }

If this is just your Computer Science homework, then you should do proper investigation first (i.e. learn), instead of asking us. For example, the SCF format is:

SCF Done: E(RHF) = -113.873389817 A.U. after 11 cycles /SCF Done.*=\s?([\-\d\.]*\d+)/

Perl uses regular expressions that are very precise, hence, the suggestion it would look like:

SCF Done= -113.873389817. /SCF Done=\s?([\-\d\.]*\d+)/

Would, in the eyes of Perl, make it totally different. Where the first line is the line of the SCF file, and the second a Perl patternmatch that will grab that part. You can see differences...

perl -ne '$V{$1}=$ARGV if /SCF Done.*=\s*([\-\d\.]*\d+)/; END{@_=sort +keys %V;print "Lowest value is $_[0] in file $V{$_[0]}\n"}' *.log

So, the big question is still: We can help, but only if you make an effort, and show us what you have up to this point. Do you have a loop, which variables are you going to use. Will you use a single or a double pass over the files. How do you think that 3th column should be retrieved, etc. Show us some code first.

Replies are listed 'Best First'.
Re^2: Making commond for large number of files
by acrobat118 (Initiate) on Apr 18, 2015 at 05:18 UTC

    I am a chemistry student and just started working on Computational Chemistry. I am a new user of perl and just strated learning it. I want to complete the following code. This is what I have done till now.

    #!/usr/bin/perl # Write a single output file with # Bond_length Delocalization_range EDR # from each calculation here use strict; my $ELowSoFar = 0.0; # The lowest energy found so far my $FileLowSoFar = ''; # The file containing the lowest energy so far foreach my $files (<*log>){ # Loop over all of the files my $E=`grep "SCF Done" $files|awk "{print \\\$5}"`; chomp($E); +# Find the energy in this file # Check if the energy in this file is LOWER than the lowest en +ergy so far # If it is, then it is the NEW lowest energy so far # and the file containing is the new FileLowSoFar print "File $files has energy $E and the lowest energy so far +is $ELowSoFar\n"; } print "The lowest total energy was $ELowSoFar\n"; print "This was in file $FileLowSoFar\n";

      Ok, great, that looks like a great start. I was checking http://nbo6.chem.wisc.edu/tut_del.htm and I assume now that EDu is the calculated energy of the Delocalization (some seem to call it deletion) in atomic units. And what you need to calculate is the Energy Delocalization Range. How about this:

      #!/usr/bin/perl # Write a single output file with # Bond_length Delocalization_range EDR # from each calculation here use strict; my $ELowSoFar = undef; # The lowest energy found so far my $FileLowSoFar = ''; # The file containing the lowest energy so far my %FILE2SCF; # $FILE2SCF{"FILENAME"} = Energy foreach my $file (<*log>){ # Loop over all of the files my $E=`grep "SCF Done" $file|awk "{print \\\$5}"`; chomp($E);# + Find the energy in this file # Check if the energy in this file is LOWER than the lowest en +ergy so far if (!defined $ELowSoFar || $ELowSoFar>$E){ # If it is, then it is the NEW lowest energy so far $ELowSoFar = $E; # and the file containing is the new FileLowSoFar $FileLowSoFar = $file; }; # store the energy of the file for later use $FILE2SCF{$file} = $E; print "File $file has energy $E and the lowest energy so far i +s $ELowSoFar\n"; } print "The lowest total energy was $ELowSoFar\n"; print "This was in file $FileLowSoFar\n"; # Now calculate Delocalization_range for each files my $minimalvalue = $FILE2SCF{$FileLowSoFar}; for my $file (sort keys %FILE2SCF){ my $currentvalue = $FILE2SCF{$file}; print "$file: Energy Delocalization Range=". ($currentvalue-$minimal +value)."\n"; }

        Thank you soo much for your help and patience. So nice of you. I have learnt something and this is what I want.

        This code is now working for me and it has solved my big problem. I shall learn slowly.

        But I am still in a little trouble. Each file contains the Electron Delocalization Range. It is at the end of each file. Here is a sample:

        Index Exponent U <EDRA> <EDRB> 1 0.50000000E+02 0.14142136E+00 0.18121396E-01 0.18121396E +-01 2 0.35714286E+02 0.16733201E+00 0.23255352E-01 0.23255352E +-01 3 0.25510204E+02 0.19798990E+00 0.29810613E-01 0.29810613E +-01 4 0.18221574E+02 0.23426481E+00 0.38155765E-01 0.38155765E +-01 5 0.13015410E+02 0.27718586E+00 0.48736939E-01 0.48736939E +-01 6 0.92967216E+01 0.32797073E+00 0.62081416E-01 0.62081416E +-01 7 0.66405154E+01 0.38806020E+00 0.78791719E-01 0.78791719E +-01 8 0.47432253E+01 0.45915902E+00 0.99523282E-01 0.99523282E +-01 9 0.33880181E+01 0.54328428E+00 0.12493692E+00 0.12493692E ++00 10 0.24200129E+01 0.64282263E+00 0.15561685E+00 0.15561685E ++00 11 0.17285807E+01 0.76059799E+00 0.19194780E+00 0.19194780E ++00 12 0.12347005E+01 0.89995168E+00 0.23395297E+00 0.23395297E ++00 13 0.88192890E+00 0.10648372E+01 0.28110962E+00 0.28110962E ++00 14 0.62994922E+00 0.12599324E+01 0.33218196E+00 0.33218196E ++00 15 0.44996373E+00 0.14907721E+01 0.38513805E+00 0.38513805E ++00 16 0.32140266E+00 0.17639053E+01 0.43723655E+00 0.43723655E ++00 17 0.22957333E+00 0.20870809E+01 0.48535182E+00 0.48535182E ++00 18 0.16398095E+00 0.24694674E+01 0.52651679E+00 0.52651679E ++00 19 0.11712925E+00 0.29219133E+01 0.55846889E+00 0.55846889E ++00 20 0.83663750E-01 0.34572544E+01 0.57979031E+00 0.57979031E ++00 21 0.59759821E-01 0.40906786E+01 0.58943084E+00 0.58943084E ++00 22 0.42685587E-01 0.48401561E+01 0.58613183E+00 0.58613183E ++00 23 0.30489705E-01 0.57269500E+01 0.56862373E+00 0.56862373E ++00 24 0.21778361E-01 0.67762186E+01 0.53666404E+00 0.53666404E ++00 25 0.15555972E-01 0.80177300E+01 0.49197001E+00 0.49197001E ++00 26 0.11111408E-01 0.94867060E+01 0.43819183E+00 0.43819183E ++00 27 0.79367203E-02 0.11224822E+02 0.37998198E+00 0.37998198E ++00 28 0.56690859E-02 0.13281388E+02 0.32182467E+00 0.32182467E ++00 29 0.40493471E-02 0.15714751E+02 0.26720453E+00 0.26720453E ++00 30 0.28923908E-02 0.18593944E+02 0.21830006E+00 0.21830006E ++00 31 0.20659934E-02 0.22000651E+02 0.17608839E+00 0.17608839E ++00 32 0.14757096E-02 0.26031521E+02 0.14065210E+00 0.14065210E ++00 33 0.10540783E-02 0.30800911E+02 0.11151704E+00 0.11151704E ++00 34 0.75291305E-03 0.36444130E+02 0.87930132E-01 0.87930132E +-01 35 0.53779504E-03 0.43121276E+02 0.69050633E-01 0.69050633E +-01

        I have been using U and sum of <EDRA> and <EDRB>. I am getting the results in .txt file. Here is the code I was using.

        use strict; # Find the lowest-energy geometry # Prepare array EDRvars0 containg the EDR at each u from that geometrr +y open(F,">results.txt"); print F "# Bond_length Delocalization_length EDR \n"; # Loop over all log files foreach my $f (<*log>){ my $c=`grep -c "Normal term" $f`; chomp($c); # Avoid files that do +dn't converge if($c>0){ # Find the bond length. We assume this is built into the file +name my $R = $f; $R=~s/.log//; $R=~s/.*_//; # Find the U valnes my $Ustr = `grep -A37 "EDR alpha" $f | tail -n35|awk "{print \ +\\$3}"`; my @Uvars = split(/\n/,$Ustr); # Convert them into an array my $NU = scalar(@Uvars); # That array has $NU elements # Find the <EDR(u)> and sum alpha and beta my $EDRstr = `grep -A37 "EDR alpha" $f | tail -n35|awk "{print + \\\$4+\\\$5}"`; my @EDRvars = split(/\n/,$EDRstr); # Print the outputs foreach my $i(0..$NU-1){ print F sprintf("%8.3E %12.6E %12.6E\n",$R,$Uvars[$i],$EDR +vars[$i]); } } } close(F);

        Now I want to get the difference in EDRs i.e. Delta<EDR> of each file from minimum energy file i.e. {(<EDRA>+<EDRB>)each file}-{(<EDRA>+<EDRB>)file with minimum energy}. The file with minmum energy is the file for which you helped in writing the code. I want the result of difference in the same text file.