arshadmehmood118 has asked for the wisdom of the Perl Monks concerning the following question:

Dear experts, I am a new learner of Perl. I want extract some information from my output files of my calculations. I have written the following script:

#!/usr/bin/perl # 2017.02.07 # Compare Qc and Dc of gold atoms use strict; open(F1,">SI.txt"); open(F2,">SI2.txt"); print F2 "\n\n\n============================ Computed geometries (Cartesian coordinates in Angstrom), total energie +s (Hartree), Hirshfeld population QA(e) and atomic overlap distances DA(bohr) of a +ll systems given in Table SI-1 and Table SI-2. \n\n"; print F1 "\n\n\n========================== Computed Q(Au) and D(Au) of all atoms in all neutral Au clusters \n\n" +; print F1 " Atom QA DA(alpha) DA(beta) DA(total)\n"; foreach my $f(<*log>){ my $c=`grep -c "Normal term" $f`;chomp($c); my $E=`tac $f|grep -m1 "SCF Done"|awk "{print \\\$5}"`; chomp( +$E); my $Nat = `cat $f|grep -m1 "NAtoms="|awk "{print \\\$2}" `; ch +omp($Nat); my $off=$Nat+4; my $coord=`tac $f|grep -m1 -B$off " orientation:"|head + -n$Nat|tac|awk -v OFS='\t' "{print\\\$1,\\\$2,\\\$4,\\\$5,\\\$6}"`; print F2 "Molecule $f \nEnergy: $E\nGeometry:\nAtom A +tomic No. x y z\n $coord Atom +QA DA(alpha) DA(beta) DA(total)\n"; foreach my $at(1..$Nat){ my $at2=$at+1; # Get the Hirshfeld populations and atomic del +ocalziations my $QA = `tac $f|grep -m1 -B$at2 "Hirshfeld ch +arges, spin" | head -n1|awk "{print \\\$3}"`;chomp($QA); my $dalpha=`tac $f|grep -m1 -B$at2 "Atomic ave +rage delocal" |head -n1 |awk "{print \\\$3}"`; chomp($dalpha); my $dbeta=`tac $f|grep -m1 -B$at2 "Atomic aver +age delocal" |head -n1 |awk "{print \\\$4}"`; chomp($dbeta); my $dtotal=`tac $f|grep -m1 -B$at2 "Atomic ave +rage delocal" |head -n1 |awk "{print \\\$5}"`; chomp($dtotal); print F1 sprintf(" %4d %7.4f %7.4f + %7.4f %7.4f \n",$at,$QA,$dalpha,$dbeta,$dtotal); } } close(F1); close(F2);

Using my limited knowledge of Perl, I have successfully extracted what I want from my output files in two files SI.txt and SI2.txt. But I want to combine information about each molecule in one file. I don't know how to write script that combines the information into one file. File SI.txt gives me:

Atom QA DA(alpha) DA(beta) DA(total) 1 1.0000 2.2238 2.6173 2.3812 1 1.3294 1.9996 1.9996 1.9996 2 -0.1098 2.2233 2.2233 2.2233 3 -0.1098 2.2233 2.2233 2.2233 4 -0.1098 2.2233 2.2233 2.2233

and File SI2.txt gives me:

Molecule 01-Carbon-Energy.log Energy: -37.7131454546 Geometry: Atom Atomic No. x y z 1 6 0.000000 0.000000 0.000000 Atom QA DA(alpha) DA(beta) DA(total) Molecule 02-Methanide-Geometry.log Energy: -39.6946868929 Geometry: Atom Atomic No. x y z 1 6 0.000000 0.000000 0.000000 2 1 0.000000 1.084453 0.000000 3 1 -0.939164 -0.542227 0.000000 4 1 0.939164 -0.542227 0.000000 Atom QA DA(alpha) DA(beta) DA(total)

But this is what I want:

Molecule 01-Carbon-Energy.log Energy: -37.7131454546 Geometry: Atom Atomic No. x y z 1 6 0.000000 0.000000 0.000000 Atom QA DA(alpha) DA(beta) DA(total) 1 1.0000 2.2238 2.6173 2.3812 Molecule 02-Methanide-Geometry.log Energy: -39.6946868929 Geometry: Atom Atomic No. x y z 1 6 0.000000 0.000000 0.000000 2 1 0.000000 1.084453 0.000000 3 1 -0.939164 -0.542227 0.000000 4 1 0.939164 -0.542227 0.000000 Atom QA DA(alpha) DA(beta) DA(total) 1 1.3294 1.9996 1.9996 1.9996 2 -0.1098 2.2233 2.2233 2.2233 3 -0.1098 2.2233 2.2233 2.2233 4 -0.1098 2.2233 2.2233 2.2233

Please help me, if possible. I shall be very much thankful

Replies are listed 'Best First'.
Re: Merging two files
by choroba (Cardinal) on Feb 24, 2017 at 04:37 UTC
    It's usually much faster and easier to do the whole work in Perl, if possible. Using awk, grep, head, tac, etc. is most often not needed. You should probably start from a simpler task and build your script in steps.

    Also, if you want us to help you, you need to provide everything we need to reproduce your situation. Without the *log files, your question doesn't belong among SSCCE.

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: Merging two files
by hippo (Archbishop) on Feb 24, 2017 at 09:24 UTC

    You could always post-process the two files to merge them but honestly that would be piling one more hack on top of an already questionable heap.

    The best advice I can give you here is to consider using some sort of template system for the output. That way you can construct your arrays of data within the program (see perldsc for how to manage a data set as a single item) and then just pass the assembled data to a template to format it how you wish in the output file. The venerable TT2 has much more functionality that you need for this one task but is worth learning IMHO as it will serve you well for solving the same general problem many times over.

Re: Merging two files
by Marshall (Canon) on Feb 25, 2017 at 08:26 UTC
    I don't really understand the "rules" by which the output is generated from the 2 files. Can you explain more about how to generate the output from SI.txt and SI2.txt?

    It could be that if you give an example of *.log files, that your desired output could be directly generated from those files without these SI.txt and SI2.txt intermediates? I don't know.

    It has been a very,very long time since I used either sed or awk. There is no need for these with Perl. Anyway your grep | awk code is not helpful to me.

    Update:
    Without really understanding the rules, this code produces your desired output. I guess what I meant above is "tell me what is wrong with this algorithm":

    #!/usr/bin/perl use strict; use warnings; ## simulate actual files ## my $SI_txt =<<END; Atom QA DA(alpha) DA(beta) DA(total) 1 1.0000 2.2238 2.6173 2.3812 1 1.3294 1.9996 1.9996 1.9996 2 -0.1098 2.2233 2.2233 2.2233 3 -0.1098 2.2233 2.2233 2.2233 4 -0.1098 2.2233 2.2233 2.2233 END my $SI2_txt =<<END; Molecule 01-Carbon-Energy.log Energy: -37.7131454546 Geometry: Atom Atomic No. x y z 1 6 0.000000 0.000000 0.000000 Atom QA DA(alpha) DA(beta) DA(total) Molecule 02-Methanide-Geometry.log Energy: -39.6946868929 Geometry: Atom Atomic No. x y z 1 6 0.000000 0.000000 0.000000 2 1 0.000000 1.084453 0.000000 3 1 -0.939164 -0.542227 0.000000 4 1 0.939164 -0.542227 0.000000 Atom QA DA(alpha) DA(beta) DA(total) END ### start of "real code" ### open my $SI, "<", \$SI_txt or die "unable to open SI.txt"; open my $SI2, "<", \$SI2_txt or die "unable to open SI2.txt"; my @SIarray; while (<$SI>) { next unless $_ =~ /\d/; #throw away header line or blank push @SIarray, $_; } while (<$SI2>) { print; if (/^\s*Atom\s+QA/) #interleave first line of SI.txt { print shift @SIarray; } } print @SIarray; __END__ Molecule 01-Carbon-Energy.log Energy: -37.7131454546 Geometry: Atom Atomic No. x y z 1 6 0.000000 0.000000 0.000000 Atom QA DA(alpha) DA(beta) DA(total) 1 1.0000 2.2238 2.6173 2.3812 Molecule 02-Methanide-Geometry.log Energy: -39.6946868929 Geometry: Atom Atomic No. x y z 1 6 0.000000 0.000000 0.000000 2 1 0.000000 1.084453 0.000000 3 1 -0.939164 -0.542227 0.000000 4 1 0.939164 -0.542227 0.000000 Atom QA DA(alpha) DA(beta) DA(total) 1 1.3294 1.9996 1.9996 1.9996 2 -0.1098 2.2233 2.2233 2.2233 3 -0.1098 2.2233 2.2233 2.2233 4 -0.1098 2.2233 2.2233 2.2233