Re^4: Making commond for large number of files

Thank you soo much for your help and patience. So nice of you. I have learnt something and this is what I want.

This code is now working for me and it has solved my big problem. I shall learn slowly.

But I am still in a little trouble. Each file contains the Electron Delocalization Range. It is at the end of each file. Here is a sample:

Index Exponent U  <EDRA>  <EDRB> 
    1   0.50000000E+02   0.14142136E+00   0.18121396E-01   0.18121396E
+-01
    2   0.35714286E+02   0.16733201E+00   0.23255352E-01   0.23255352E
+-01
    3   0.25510204E+02   0.19798990E+00   0.29810613E-01   0.29810613E
+-01
    4   0.18221574E+02   0.23426481E+00   0.38155765E-01   0.38155765E
+-01
    5   0.13015410E+02   0.27718586E+00   0.48736939E-01   0.48736939E
+-01
    6   0.92967216E+01   0.32797073E+00   0.62081416E-01   0.62081416E
+-01
    7   0.66405154E+01   0.38806020E+00   0.78791719E-01   0.78791719E
+-01
    8   0.47432253E+01   0.45915902E+00   0.99523282E-01   0.99523282E
+-01
    9   0.33880181E+01   0.54328428E+00   0.12493692E+00   0.12493692E
++00
   10   0.24200129E+01   0.64282263E+00   0.15561685E+00   0.15561685E
++00
   11   0.17285807E+01   0.76059799E+00   0.19194780E+00   0.19194780E
++00
   12   0.12347005E+01   0.89995168E+00   0.23395297E+00   0.23395297E
++00
   13   0.88192890E+00   0.10648372E+01   0.28110962E+00   0.28110962E
++00
   14   0.62994922E+00   0.12599324E+01   0.33218196E+00   0.33218196E
++00
   15   0.44996373E+00   0.14907721E+01   0.38513805E+00   0.38513805E
++00
   16   0.32140266E+00   0.17639053E+01   0.43723655E+00   0.43723655E
++00
   17   0.22957333E+00   0.20870809E+01   0.48535182E+00   0.48535182E
++00
   18   0.16398095E+00   0.24694674E+01   0.52651679E+00   0.52651679E
++00
   19   0.11712925E+00   0.29219133E+01   0.55846889E+00   0.55846889E
++00
   20   0.83663750E-01   0.34572544E+01   0.57979031E+00   0.57979031E
++00
   21   0.59759821E-01   0.40906786E+01   0.58943084E+00   0.58943084E
++00
   22   0.42685587E-01   0.48401561E+01   0.58613183E+00   0.58613183E
++00
   23   0.30489705E-01   0.57269500E+01   0.56862373E+00   0.56862373E
++00
   24   0.21778361E-01   0.67762186E+01   0.53666404E+00   0.53666404E
++00
   25   0.15555972E-01   0.80177300E+01   0.49197001E+00   0.49197001E
++00
   26   0.11111408E-01   0.94867060E+01   0.43819183E+00   0.43819183E
++00
   27   0.79367203E-02   0.11224822E+02   0.37998198E+00   0.37998198E
++00
   28   0.56690859E-02   0.13281388E+02   0.32182467E+00   0.32182467E
++00
   29   0.40493471E-02   0.15714751E+02   0.26720453E+00   0.26720453E
++00
   30   0.28923908E-02   0.18593944E+02   0.21830006E+00   0.21830006E
++00
   31   0.20659934E-02   0.22000651E+02   0.17608839E+00   0.17608839E
++00
   32   0.14757096E-02   0.26031521E+02   0.14065210E+00   0.14065210E
++00
   33   0.10540783E-02   0.30800911E+02   0.11151704E+00   0.11151704E
++00
   34   0.75291305E-03   0.36444130E+02   0.87930132E-01   0.87930132E
+-01
   35   0.53779504E-03   0.43121276E+02   0.69050633E-01   0.69050633E
+-01
[download]

I have been using U and sum of <EDRA> and <EDRB>. I am getting the results in .txt file. Here is the code I was using.

use strict; 

# Find the lowest-energy geometry 

# Prepare array EDRvars0 containg the EDR at each u from that geometrr
+y 

open(F,">results.txt");
print F "# Bond_length Delocalization_length EDR \n";

# Loop over all log files 
foreach my $f (<*log>){
    my $c=`grep -c "Normal term" $f`; chomp($c); # Avoid files that do
+dn't converge 
    if($c>0){

        # Find the bond length. We assume this is built into the file 
+name 
        my $R = $f;
        $R=~s/.log//;
        $R=~s/.*_//;

        # Find the U valnes 
        my $Ustr = `grep -A37 "EDR alpha" $f | tail -n35|awk "{print \
+\\$3}"`;
        my @Uvars = split(/\n/,$Ustr); # Convert them into an array 
        my $NU = scalar(@Uvars); # That array has $NU elements 

        # Find the <EDR(u)> and sum alpha and beta 
        my $EDRstr = `grep -A37 "EDR alpha" $f | tail -n35|awk "{print
+ \\\$4+\\\$5}"`;
        my @EDRvars = split(/\n/,$EDRstr);

        # Print the outputs 
        foreach my $i(0..$NU-1){
            print F sprintf("%8.3E %12.6E %12.6E\n",$R,$Uvars[$i],$EDR
+vars[$i]);
        }
        
    }
}
close(F);
[download]

Now I want to get the difference in EDRs i.e. Delta<EDR> of each file from minimum energy file i.e. {(<EDRA>+<EDRB>)each file}-{(<EDRA>+<EDRB>)file with minimum energy}. The file with minmum energy is the file for which you helped in writing the code. I want the result of difference in the same text file.

Comment on Re^4: Making commond for large number of files Select or Download Code

Replies are listed 'Best First'.
Re^5: Making commond for large number of files by GotToBTru (Prior) on Apr 19, 2015 at 23:10 UTC
I note that the data you are interested are in the last part of the file. The reverse <$fh> is a pretty clunky way to read the file backwards but it works. If you can install File::ReadBackwards, it will be nicer, I think. A slightly nicer version of your program: foreach my $f (<log>){ open my $fh,'<',$f; my @lines = reverse <$fh>; close $fh; next if ((shift @lines) !~ /Normal termination/); # Find the bond length. We assume this is built into the file name + my $R = $f; $R=~s/.log//; $R=~s/._//; my ($u,$edra,$edrb,@EDRvars,@Uvars); foreach my $line (@lines) { next unless ($line =~ m/^ \d/); last if ($line =~ m/^ Index/); (undef,undef,$u,$edra,$edrb) = unpack('A7A17A17A17A17',$line); push @Uvars, $u; push @EDRvars, $edra + $edrb; } # Print the outputs foreach my $i(0..$#Uvars){ print sprintf("%8.3E %12.6E %12.6E\n",$R,$Uvars[$i],$EDRvars[$ +i]); } } [download] Dum Spiro Spero	[reply] [d/l]
Re^5: Making commond for large number of files by FreeBeerReekingMonk (Deacon) on Apr 18, 2015 at 22:20 UTC
One way I see how it could be done is passing the Minimum Energy filename to the script, here in untested code, assuming you do not store the output results from the minimum enery: ./leg.pl `mineng.pl \|grep "This was in file"\|awk '{print $5}'` [download] And then in leg.pl: `my $minfilename = shift; # = $ARGV[0]; : : #somewhere in the loop: if($f eq $minfilename){ #store values to substract them later }` [download] However, if you DO save that file and have a known filename, lets call it min.txt, then this also works: my $minfilename = `grep "This was in file" min.txt\|awk '{print $5}'`; [download] And go from there, testing against $f to get its values.	[reply] [d/l] [select]
Re^6: Making commond for large number of files by acrobat118 (Initiate) on Apr 18, 2015 at 22:44 UTC
Thank you for your help. Unfortunately it is not working for me. It is only storing and printing values of energy. I am not understanding, how I shall code for <EDRs>.	[reply]
Re^5: Making commond for large number of files by GotToBTru (Prior) on Apr 19, 2015 at 22:53 UTC
There are 35 EDRA and EDRB values in each file, it appears. For each line of the 199 other files, do you need to subtract the EDRA+EDRB value from the corresponding line of the file with mininum energy? `File with minimum energy 1 10 20 30 40 2 20 30 45 50 Another file 1 10 11 50 50 2 20 21 60 60 Do you want: R 11 30 R 21 5 30 = (50 + 50) - (30 + 40) 5 = (60 + 60) - (45 + 50) I just put 'R' since I don't know what that value should be.` [download] Dum Spiro Spero	[reply] [d/l]
Re^6: Making commond for large number of files by acrobat118 (Initiate) on Apr 20, 2015 at 01:06 UTC
I am really apologized for this inconvenience. But thank you very much for your help. Exactly this is what I want with little addition: `File with minimum energy 1 10 20 30 40 2 20 30 45 50 Another file 1 10 11 50 50 2 20 21 60 60 I want: R 11 100 30 R 21 120 5 30 = (50 + 50) - (30 + 40) 5 = (60 + 60) - (45 + 50) Where 100 = 50 + 50 120 = 60 + 60 .` [download] I further explain in detail. The below code which I have already posted (I am posting again)is giving me an output file with Bond Length R, Delocalization Range and Sum of EDRA and EDRB ($4+$5). use strict; # Find the lowest-energy geometry # Prepare array EDRvars0 containg the EDR at each u from that geometrr +y open(F,">results.txt"); print F "# Bond_length Delocalization_length EDR \n"; # Loop over all log files foreach my $f (<log>){ my $c=`grep -c "Normal term" $f`; chomp($c); # Avoid files that do +dn't converge if($c>0){ # Find the bond length. We assume this is built into the file +name my $R = $f; $R=~s/.log//; $R=~s/._//; # Find the U valnes my $Ustr = `grep -A37 "EDR alpha" $f \| tail -n35\|awk "{print \ +\\$3}"`; my @Uvars = split(/\n/,$Ustr); # Convert them into an array my $NU = scalar(@Uvars); # That array has $NU elements # Find the <EDR(u)> and sum alpha and beta my $EDRstr = `grep -A37 "EDR alpha" $f \| tail -n35\|awk "{print + \\\$4+\\\$5}"`; my @EDRvars = split(/\n/,$EDRstr); # Print the outputs foreach my $i(0..$NU-1){ print F sprintf("%8.3E %12.6E %12.6E\n",$R,$Uvars[$i],$EDR +vars[$i]); } } } close(F); [download] The out put is like this: `R U EDR(that is EDRA+EDRB)` Now in this output I want to add another column Delta EDR that gives me the difference in ERD of any file and EDR of lowest energy file. Like: `R U EDR(that is EDRA+EDRB) Delta EDR (That is {(<EDRA>+<EDRB>)each file}-{(<EDRA>+<EDRB>)file with minimum energy})` Mean I want to edit the above code in such a way that it give me fifth column in out put containing the {(<EDRA>+<EDRB>)each file}-{(<EDRA>+<EDRB>)file with minimum energy}). I am waiting your kind reply.	[reply] [d/l] [select]
Re^7: Making commond for large number of files by GotToBTru (Prior) on Apr 22, 2015 at 14:53 UTC
I'm not really here to provide other people with programs. I rewrote your first program and you disregarded my suggestions on how to replace various grep and awk commands with Perl. Looks like you need to make two passes through your .log files. On the first one, look for the "SCF Done:" line and pull out the value. When you see a value larger than you have seen before, remember it, the file name, and build up an array like @EDRvars to record the sum of the EDRA and B values - maybe call it @bigEDRvars. This array you will use in the second part of the program to subtract from the values you get from the other log files. The second time thru the file list, add the following to your foreach loop: `next if $f eq $big_file;` [download] where $big_file contains the name of the file you found in the first loop. The rest of the program can be pretty much as I have already provided, with one addition to the printf where you subtract the $bigEDRvars[$i] value from $EDRvars[$i]. When you get that done, if it still doesn't work, post the code here and we can make suggestions. Dum Spiro Spero	[reply] [d/l]