Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Parsing Guassian '03 Log Files

by Andrew_Levenson (Hermit)
on Jan 31, 2008 at 01:12 UTC ( [id://665250]=perlquestion: print w/replies, xml ) Need Help??

Andrew_Levenson has asked for the wisdom of the Perl Monks concerning the following question:

Greetings, monks! It's been a while.I haven't been to the site, or even really thought about perl in a while, as I've been off having adventures in university-land.
Now, i'm doing some undergraduate research for theoretical chemistry, and find myself working with absolutely monstrous log files from the absolutely ingenious Gaussian '03. Today, my research mentor asked me to manually trudge through the log files to compile a list of bond lengths between certain atoms in the molecule I'm working on, but I realized that using perl could save me days worth of eye-straining work. Hooray! Unfortunately, I've grown a bit rusty.

What I need to do is create a script that reads in a log file, searches for an area of text between "Optimized Parameters" and "GradGradGradGrad", and then read each line and rip out the value and see if it fits in the range I'm working with. I know a regular expression will work wonders, but I am, as I said, really very rusty. If the individual lines I need to read look something like this example:
! hc2 1.1136 -DE/DX = 0.0 !

and a hypothetical 4 entry log looks something like this:
---------------------------- ! Optimized Parameters ! ! (Angstroms and Degrees) ! ---------------------- ------------------- +----------------------------------------------------- ! Name Value Derivative information (Atomic Units) + ! --------------------------------------------------------------------- +--- ! hc2 1.1136 -DE/DX = 0.0 + ! ! nc3 1.3392 -DE/DX = 0.0 + ! ! nch3 117.4979 -DE/DX = 0.0 + ! ! hn4 0.9929 -DE/DX = 0.0 + ! ---------------------------------------------------------------------- +-- GradGradGradGradGradGradGradGradGradGradGradGradGradGradGradGradGradGr +adGrad

How can I grab the numbers under 'value,' keeping in mind that there may be lines looking like this in other sections of the incredibly dense log files?


Thanks a million in advanced.
C(qw/74 97 104 112/);sub C{while(@_){$c**=$C;print (map{chr($C!=$c?shift:pop)}$_),$C+=@_%2!=1?1:0}}

Replies are listed 'Best First'.
Re: Parsing Guassian '03 Log Files
by GrandFather (Saint) on Jan 31, 2008 at 01:43 UTC

    Something like:

    use strict; use warnings; my $file = <<FILE; ---------------------------- ! Optimized Parameters ! ! (Angstroms and Degrees) ! ---------------------- ------------------- +----------------------------------------------------- ! Name Value Derivative information (Atomic Units) + ! --------------------------------------------------------------------- +--- ! hc2 1.1136 -DE/DX = 0.0 + ! ! nc3 1.3392 -DE/DX = 0.0 + ! ! nch3 117.4979 -DE/DX = 0.0 + ! ! hn4 0.9929 -DE/DX = 0.0 + ! ---------------------------------------------------------------------- +-- GradGradGradGradGradGradGradGradGradGradGradGradGradGradGradGradGradGr +adGrad FILE open my $inFile, '<', \$file; my $valueStart; my $valueLength; my $min = 1.0; my $max = 2.0; while (<$inFile>) { next unless m/Optimized Parameters/ .. m/GradGradGrad/; next unless $valueLength or /^(.*?Name)(\s*Value)/; if (! $valueLength) { $valueStart = length $1; $valueLength = length $2; next; } my $value = substr $_, $valueStart, $valueLength; next unless $value =~ /^\s*[+-]?\d+(\.\d*)?\s*$/; next if $value < $min or $value > $max; print $_; } close $inFile;

    Prints:

    ! hc2 1.1136 -DE/DX = 0.0 + ! ! nc3 1.3392 -DE/DX = 0.0 + !

    Perl is environmentally friendly - it saves trees
      Thank you very much, but unfortunately, I need it to work in such a way that I call the script/program, feed it an input filename and an output filename, and have it work its magic. Each file is of an undetermined length, with sections that need to be parsed f undetermined length, as each file represents a different chain-length of the molecule.

      Sorry that I didn't specify that earlier.
      C(qw/74 97 104 112/);sub C{while(@_){$c**=$C;print (map{chr($C!=$c?shift:pop)}$_),$C+=@_%2!=1?1:0}}

        Well, I couldn't do it all for you - what would you do with all the time you saved?

        You may find help in dealing with the file issues in replies to the thread File read and strip ;).


        Perl is environmentally friendly - it saves trees
Re: Parsing Guassian '03 Log Files
by BrowserUk (Patriarch) on Jan 31, 2008 at 03:42 UTC

    Use as perl -n0 thisScript theDataFile >outputFile

    #! perl use strict; s[ Optimized \s Parameters (?: .+? \n ){5} (.+?) -+ \n GradGrad ]{ print "$_\n" for map{ (split)[ 2 ] } split "\n", $1; }gsex or warn 'No match';;

    Against your sample (with some junk before and after) produces:

    C:\test>perl -n0 junk9.pl junk2.dat 1.1136 1.3392 117.4979 0.9929

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      That's fantastic, thank you. I'll test and tinker with it when I get to the lab tomorrow, see if I can't devise some solution of my own given your guidance.
      C(qw/74 97 104 112/);sub C{while(@_){$c**=$C;print (map{chr($C!=$c?shift:pop)}$_),$C+=@_%2!=1?1:0}}

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://665250]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (5)
As of 2024-03-28 18:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found