Re: Making commond for large number of files
by FreeBeerReekingMonk (Deacon) on Apr 17, 2015 at 19:13 UTC
|
Although you CAN use simple greps to get things, I would advise the use of http://www.bioperl.org/wiki/HOWTO:SeqIO It understands the Standard Chromatogram Format (SCF) and can grab exactly what you will ever need.
use Bio::SeqIO;
$in = new Bio::SeqIO -file => 'p122.gel', -format=>'SCF';
$seq = $in->next_seq();
# if you want to look at the various comment field data:
while ( ($key, $val) = each %{$seq->names()} ) {
print "$key\t$val\n";
}
If this is just your Computer Science homework, then you should do proper investigation first (i.e. learn), instead of asking us.
For example, the SCF format is:
SCF Done: E(RHF) = -113.873389817 A.U. after 11
cycles
/SCF Done.*=\s?([\-\d\.]*\d+)/
Perl uses regular expressions that are very precise, hence, the suggestion it would look like:
SCF Done= -113.873389817.
/SCF Done=\s?([\-\d\.]*\d+)/
Would, in the eyes of Perl, make it totally different. Where the first line is the line of the SCF file, and the second a Perl patternmatch that will grab that part. You can see differences...
perl -ne '$V{$1}=$ARGV if /SCF Done.*=\s*([\-\d\.]*\d+)/; END{@_=sort
+keys %V;print "Lowest value is $_[0] in file $V{$_[0]}\n"}' *.log
So, the big question is still: We can help, but only if you make an effort, and show us what you have up to this point. Do you have a loop, which variables are you going to use. Will you use a single or a double pass over the files. How do you think that 3th column should be retrieved, etc. Show us some code first.
| [reply] [d/l] [select] |
|
|
#!/usr/bin/perl
# Write a single output file with
# Bond_length Delocalization_range EDR
# from each calculation here
use strict;
my $ELowSoFar = 0.0; # The lowest energy found so far
my $FileLowSoFar = ''; # The file containing the lowest energy so far
foreach my $files (<*log>){ # Loop over all of the files
my $E=`grep "SCF Done" $files|awk "{print \\\$5}"`; chomp($E);
+# Find the energy in this file
# Check if the energy in this file is LOWER than the lowest en
+ergy so far
# If it is, then it is the NEW lowest energy so far
# and the file containing is the new FileLowSoFar
print "File $files has energy $E and the lowest energy so far
+is $ELowSoFar\n";
}
print "The lowest total energy was $ELowSoFar\n";
print "This was in file $FileLowSoFar\n";
| [reply] [d/l] |
|
|
Ok, great, that looks like a great start.
I was checking http://nbo6.chem.wisc.edu/tut_del.htm
and I assume now that EDu is the calculated energy of the Delocalization (some seem to call it deletion) in atomic units. And what you need to calculate is the Energy Delocalization Range. How about this:
#!/usr/bin/perl
# Write a single output file with
# Bond_length Delocalization_range EDR
# from each calculation here
use strict;
my $ELowSoFar = undef; # The lowest energy found so far
my $FileLowSoFar = ''; # The file containing the lowest energy so far
my %FILE2SCF; # $FILE2SCF{"FILENAME"} = Energy
foreach my $file (<*log>){ # Loop over all of the files
my $E=`grep "SCF Done" $file|awk "{print \\\$5}"`; chomp($E);#
+ Find the energy in this file
# Check if the energy in this file is LOWER than the lowest en
+ergy so far
if (!defined $ELowSoFar || $ELowSoFar>$E){
# If it is, then it is the NEW lowest energy so far
$ELowSoFar = $E;
# and the file containing is the new FileLowSoFar
$FileLowSoFar = $file;
};
# store the energy of the file for later use
$FILE2SCF{$file} = $E;
print "File $file has energy $E and the lowest energy so far i
+s $ELowSoFar\n";
}
print "The lowest total energy was $ELowSoFar\n";
print "This was in file $FileLowSoFar\n";
# Now calculate Delocalization_range for each files
my $minimalvalue = $FILE2SCF{$FileLowSoFar};
for my $file (sort keys %FILE2SCF){
my $currentvalue = $FILE2SCF{$file};
print "$file: Energy Delocalization Range=". ($currentvalue-$minimal
+value)."\n";
}
| [reply] [d/l] |
|
|
|
|
|
|
|
|
|
|
Re: Making commond for large number of files
by vinoth.ree (Monsignor) on Apr 17, 2015 at 10:24 UTC
|
my %SCF_Done=('FileName1.txt'=>{energy_value=>123,line=>"line from eac
+h file"});
From here you do whatever you want, sort the energy_value and find the lowest value of energy and do split on the line and get each column and do subtract on $3 and save to a file.
All is well. I learn by answering your questions...
| [reply] [d/l] |
|
|
I have been successful in looping over all the files by grep grep "SCF Done" $files|awk "{print \\\$5}"`;. It has given the SCF Done from each file. Now I can search my self among the SCF Done list of 200 files that which one has lowest "SCF Done". I have fonud a file. But problem is that I want to use that file in next step. Each file has a column with electron delocalization papameter that a long column with number but column header is (EDu). I want to code for:
1. Find the file among 200 files which has lowest SCF Done. 2. Then I want to use the (EDu) column of this minimum SCF Done file from step 1 to get the difference of (EDu) of all remaining 199 files and want to get result in a text file. For example if (EDu) of minimum SCF Done file be (EDu)min. I want (EDu)min - (EDu)1, (EDu)min - (EDu)2, (EDu)min - (EDu)3.......... where 1, 2, 3..... represent the name of all files in the directory.
Here I am sending the code which is given by my Supervisor and I am required to finalize it. Please complete it if possible.
use strict;
my $SCFDoneLowSoFar = 0.0; # The lowest energy found so far
my $FileLowSoFar = ''; # The file containing the lowest energy so far
foreach my $files (<*log>){ # Loop over all of the files
my $E=`grep "SCF Done" $files|awk "{print \\\$5}"`; chomp($E);
+# Find the energy in this file
# Check if the energy in this file is LOWER than the lowest en
+ergy so far
# If it is, then it is the NEW lowest energy so far
# and the file containing is the new FileLowSoFar
print "File $files has energy $E and the lowest energy so far
+is $SCFDoneLowSoFar\n";
}
print "The lowest total energy was $ELowSoFar\n";
print "This was in file $FileLowSoFar\n";
I ahve been struggling for more tahn a week. | [reply] [d/l] [select] |
Re: Making commond for large number of files
by hippo (Archbishop) on Apr 17, 2015 at 08:44 UTC
|
| [reply] |
Re: Making commond for large number of files
by GotToBTru (Prior) on Apr 17, 2015 at 14:37 UTC
|
What do the input lines look like? We know one line in the file contains "SCF Done = 999" but then you talk about column 3. What does the rest of that line look like? Is it comma or space or tab delimited? Or is column 3 in a different line than the "SCF Done" line?
Assuming file 2 contained the lowest SCF, do you want something like the following (where 99 stands in for the actual column 3 value minus the lowest SCF)?
file1 99
file3 99
...
file200 99
| [reply] [d/l] |
|
|
Thanks for your reply. Each files contains information in number and alphabets. There are two things which I need from each file. One is the SCF Done that is energy like SCF Done =0.56846. Second is a column with electron delocalization papameter that a long column with number but column header is (EDu). I want to.
1. Which file among 200 files have lowest SCF Done.
2. Then I want to use the (EDu) column of this minimized SCF Done file from step 1 to get the difference of (EDu) of all remaining 199 files and want to get result in a text file. For example if (EDu) of minimum SCF Done file be (EDu)min. I want (EDu)min - (EDu)1, (EDu)min - (EDu)2.......... where 1, 2, 3..... represent the name of all files in the directory.
| [reply] |
|
|
You just repeated what you said in the first post, but did not answer my questions.
Can you post a small sample of the contents of one of those 200 files?
| [reply] |
|
|
|
|
|
|
Re: Making commond for large number of files
by aaron_baugher (Curate) on Apr 17, 2015 at 17:57 UTC
|
#!/usr/bin/env perl
use 5.010; use strict; use warnings;
my $low = 999999999; my $lowfile; my $lowline;
for my $f (@ARGV){
open my $fd, '<', $f or die $!;
while(<$fd>){
if(/SCF Done\s*=\s*([\d\.]+)/ and $1 < $low){
$low = $1;
$lowfile = $f;
$lowline = $.;
}
}
close $fd;
}
say "File '$lowfile' contains lowest SCF Done value of $low on line $l
+owline";
This took 10 minutes, half of which was spent preparing the data files for testing. Next time you want to "receive" your work done for you, maybe it'd be smart to pay a programmer for an hour of work and spend the rest of the week struggling on a paid vacation.
Aaron B.
Available for small or large Perl jobs and *nix system administration; see my home node.
| [reply] [d/l] |
Re: Making commond for large number of files
by Anonymous Monk on Apr 17, 2015 at 07:55 UTC
|
Please help me as I have been struggling for a week. Anxiously waiting your reply. Hi acrobat118, got code ?
| [reply] |
|
|
| [reply] |
|
|
Code is not created by receiving, it's created by typing it. That's how programmers work.
| [reply] |
|
|