acrobat118 has asked for the wisdom of the Perl Monks concerning the following question:

Dear fellows. I am new user of perl. I need your help in a problem. I have 200 files with extension .log. Each file contains alot of text and information. But I am only concern with a term in each file as SCF Done = some value of energy. The value of energy is different in each file. I want to make a proscript with perl in which I want: 1. Find the file among 200 files which contains lowest value of energy I.e. lowest value of SCF Done. 2. From a column say column $3 of that file found in step 1, individually substract the column $3 of all remaining 199 files. 3. Report the result of step 2 for each file separately in a single column in a single text file. Please help me as I have been struggling for a week. Anxiously waiting your reply.

  • Comment on Making commond for large number of files

Replies are listed 'Best First'.
Re: Making commond for large number of files
by FreeBeerReekingMonk (Deacon) on Apr 17, 2015 at 19:13 UTC

    Although you CAN use simple greps to get things, I would advise the use of http://www.bioperl.org/wiki/HOWTO:SeqIO It understands the Standard Chromatogram Format (SCF) and can grab exactly what you will ever need.

    use Bio::SeqIO; $in = new Bio::SeqIO -file => 'p122.gel', -format=>'SCF'; $seq = $in->next_seq(); # if you want to look at the various comment field data: while ( ($key, $val) = each %{$seq->names()} ) { print "$key\t$val\n"; }

    If this is just your Computer Science homework, then you should do proper investigation first (i.e. learn), instead of asking us. For example, the SCF format is:

    SCF Done: E(RHF) = -113.873389817 A.U. after 11 cycles /SCF Done.*=\s?([\-\d\.]*\d+)/

    Perl uses regular expressions that are very precise, hence, the suggestion it would look like:

    SCF Done= -113.873389817. /SCF Done=\s?([\-\d\.]*\d+)/

    Would, in the eyes of Perl, make it totally different. Where the first line is the line of the SCF file, and the second a Perl patternmatch that will grab that part. You can see differences...

    perl -ne '$V{$1}=$ARGV if /SCF Done.*=\s*([\-\d\.]*\d+)/; END{@_=sort +keys %V;print "Lowest value is $_[0] in file $V{$_[0]}\n"}' *.log

    So, the big question is still: We can help, but only if you make an effort, and show us what you have up to this point. Do you have a loop, which variables are you going to use. Will you use a single or a double pass over the files. How do you think that 3th column should be retrieved, etc. Show us some code first.

      I am a chemistry student and just started working on Computational Chemistry. I am a new user of perl and just strated learning it. I want to complete the following code. This is what I have done till now.

      #!/usr/bin/perl # Write a single output file with # Bond_length Delocalization_range EDR # from each calculation here use strict; my $ELowSoFar = 0.0; # The lowest energy found so far my $FileLowSoFar = ''; # The file containing the lowest energy so far foreach my $files (<*log>){ # Loop over all of the files my $E=`grep "SCF Done" $files|awk "{print \\\$5}"`; chomp($E); +# Find the energy in this file # Check if the energy in this file is LOWER than the lowest en +ergy so far # If it is, then it is the NEW lowest energy so far # and the file containing is the new FileLowSoFar print "File $files has energy $E and the lowest energy so far +is $ELowSoFar\n"; } print "The lowest total energy was $ELowSoFar\n"; print "This was in file $FileLowSoFar\n";

        Ok, great, that looks like a great start. I was checking http://nbo6.chem.wisc.edu/tut_del.htm and I assume now that EDu is the calculated energy of the Delocalization (some seem to call it deletion) in atomic units. And what you need to calculate is the Energy Delocalization Range. How about this:

        #!/usr/bin/perl # Write a single output file with # Bond_length Delocalization_range EDR # from each calculation here use strict; my $ELowSoFar = undef; # The lowest energy found so far my $FileLowSoFar = ''; # The file containing the lowest energy so far my %FILE2SCF; # $FILE2SCF{"FILENAME"} = Energy foreach my $file (<*log>){ # Loop over all of the files my $E=`grep "SCF Done" $file|awk "{print \\\$5}"`; chomp($E);# + Find the energy in this file # Check if the energy in this file is LOWER than the lowest en +ergy so far if (!defined $ELowSoFar || $ELowSoFar>$E){ # If it is, then it is the NEW lowest energy so far $ELowSoFar = $E; # and the file containing is the new FileLowSoFar $FileLowSoFar = $file; }; # store the energy of the file for later use $FILE2SCF{$file} = $E; print "File $file has energy $E and the lowest energy so far i +s $ELowSoFar\n"; } print "The lowest total energy was $ELowSoFar\n"; print "This was in file $FileLowSoFar\n"; # Now calculate Delocalization_range for each files my $minimalvalue = $FILE2SCF{$FileLowSoFar}; for my $file (sort keys %FILE2SCF){ my $currentvalue = $FILE2SCF{$file}; print "$file: Energy Delocalization Range=". ($currentvalue-$minimal +value)."\n"; }
Re: Making commond for large number of files
by vinoth.ree (Monsignor) on Apr 17, 2015 at 10:24 UTC
    Hi,

    Here are some of ideas,

    1.Search for the string 'SCF Done=\d+'in each file, use this node for how to find a string in a file from this node how can i search a text file for a string and print every occurence of that string, do it for all the file and hold the data in a hash like below,

    my %SCF_Done=('FileName1.txt'=>{energy_value=>123,line=>"line from eac +h file"});

    From here you do whatever you want, sort the energy_value and find the lowest value of energy and do split on the line and get each column and do subtract on $3 and save to a file.


    All is well. I learn by answering your questions...

      I have been successful in looping over all the files by grep  grep "SCF Done" $files|awk "{print \\\$5}"`;. It has given the SCF Done from each file. Now I can search my self among the SCF Done list of 200 files that which one has lowest "SCF Done". I have fonud a file. But problem is that I want to use that file in next step. Each file has a column with electron delocalization papameter that a long column with number but column header is (EDu). I want to code for:

      1. Find the file among 200 files which has lowest SCF Done.

      2. Then I want to use the (EDu) column of this minimum SCF Done file from step 1 to get the difference of (EDu) of all remaining 199 files and want to get result in a text file. For example if (EDu) of minimum SCF Done file be (EDu)min. I want (EDu)min - (EDu)1, (EDu)min - (EDu)2, (EDu)min - (EDu)3.......... where 1, 2, 3..... represent the name of all files in the directory.

      Here I am sending the code which is given by my Supervisor and I am required to finalize it. Please complete it if possible.

      use strict; my $SCFDoneLowSoFar = 0.0; # The lowest energy found so far my $FileLowSoFar = ''; # The file containing the lowest energy so far foreach my $files (<*log>){ # Loop over all of the files my $E=`grep "SCF Done" $files|awk "{print \\\$5}"`; chomp($E); +# Find the energy in this file # Check if the energy in this file is LOWER than the lowest en +ergy so far # If it is, then it is the NEW lowest energy so far # and the file containing is the new FileLowSoFar print "File $files has energy $E and the lowest energy so far +is $SCFDoneLowSoFar\n"; } print "The lowest total energy was $ELowSoFar\n"; print "This was in file $FileLowSoFar\n";

      I ahve been struggling for more tahn a week.

Re: Making commond for large number of files
by hippo (Archbishop) on Apr 17, 2015 at 08:44 UTC
    But I am only concern with a term in each file as SCF Done = some value of energy.

    grep first. Then you only have 1 file to deal with rather than 200. Simpler now?

Re: Making commond for large number of files
by GotToBTru (Prior) on Apr 17, 2015 at 14:37 UTC

    What do the input lines look like? We know one line in the file contains "SCF Done = 999" but then you talk about column 3. What does the rest of that line look like? Is it comma or space or tab delimited? Or is column 3 in a different line than the "SCF Done" line?

    Assuming file 2 contained the lowest SCF, do you want something like the following (where 99 stands in for the actual column 3 value minus the lowest SCF)?

    file1 99 file3 99 ... file200 99
    Dum Spiro Spero

      Thanks for your reply. Each files contains information in number and alphabets. There are two things which I need from each file. One is the SCF Done that is energy like SCF Done =0.56846. Second is a column with electron delocalization papameter that a long column with number but column header is (EDu). I want to. 1. Which file among 200 files have lowest SCF Done. 2. Then I want to use the (EDu) column of this minimized SCF Done file from step 1 to get the difference of (EDu) of all remaining 199 files and want to get result in a text file. For example if (EDu) of minimum SCF Done file be (EDu)min. I want (EDu)min - (EDu)1, (EDu)min - (EDu)2.......... where 1, 2, 3..... represent the name of all files in the directory.

        You just repeated what you said in the first post, but did not answer my questions.

        Can you post a small sample of the contents of one of those 200 files?

        Dum Spiro Spero
Re: Making commond for large number of files
by aaron_baugher (Curate) on Apr 17, 2015 at 17:57 UTC
    #!/usr/bin/env perl use 5.010; use strict; use warnings; my $low = 999999999; my $lowfile; my $lowline; for my $f (@ARGV){ open my $fd, '<', $f or die $!; while(<$fd>){ if(/SCF Done\s*=\s*([\d\.]+)/ and $1 < $low){ $low = $1; $lowfile = $f; $lowline = $.; } } close $fd; } say "File '$lowfile' contains lowest SCF Done value of $low on line $l +owline";

    This took 10 minutes, half of which was spent preparing the data files for testing. Next time you want to "receive" your work done for you, maybe it'd be smart to pay a programmer for an hour of work and spend the rest of the week struggling on a paid vacation.

    Aaron B.
    Available for small or large Perl jobs and *nix system administration; see my home node.

Re: Making commond for large number of files
by Anonymous Monk on Apr 17, 2015 at 07:55 UTC

    Please help me as I have been struggling for a week. Anxiously waiting your reply.

    Hi acrobat118, got code ?

      Thanks for your reply. But I have not received the code yet. Waiting....

        Code is not created by receiving, it's created by typing it. That's how programmers work.
        لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ