Brokensoulkeeper has asked for the wisdom of the Perl Monks concerning the following question:

#!/usr/bin/perl -w $xf = "~/OUTPUT"; #enter file name location here <- $num = 1000; #enter number of iterations here <- $tnum = 20; #enter number of pdbs necessary <- $lab = "BRO"; #enter label for output <- open (XF, $xf) or die "no $xf exists!!"; while (my $line = <XF>) { #reads file, takes out erroneous data if($line = m/TIMESTEP/) { #saves each chunk of 2703 line to $line = ""; #one array slot in superchunk @file = scalar(@trans); @superchunk = $trans[0]; } #trans has only one #slot in it i want that #slot to go to one slot in super chunk else if($line = m/ATOM/) { @file = $line; } $num / $tnum = $div; } for($i=1;$i<=$tnum;++$i) { @name = ('> ', $lab , '_' , $i , '.pdb'); @name = scalar(@hopname); print "$hopname[0] \n"; open (OU, $hopname[0]) or die "File would not open!!"); #The $lab and $i vars need to be output into t +he filename #string so that each filename is different yet + identify- #-able as part of this process and the .pdb is + the file #extention print OU (@superchunk[$i*$div]); close OU } #prints every 20th 2703 line chunk +to #file, different file for each +chunk close XF
I am completely new to this language and have very little if any ability to use it. This here is my best attempt to create a program that splits up a very large output file from a program called DL_POLY the idea as notes should imply in code is to read out lines of the big output file and then every time that the word TIMESTEP is used to take what is has read and put it in to a scalar and the transfer the scalar to a single slot of an array. the scalar to my knowledge should retain the "\n" so that when it is later written to a file the format should look the same as it did in the oupt put file. once it has read through the entire file and created the @superchunk which contains a single entry for every block which was split off it then goes through @superchunk and at 20 seperate locations it picks out an entry and writes it to a file that is named by a label and a number. the label should remain static and the number should vary through out the process. the .pdb is text that needs to be tact on to the file name so that the file wil run through the immageing program that will be used. so after the script is run on the UNIx supercomputer , cause thats where the files at. i should end up with 20 files called BRO_#.pdb am i right ???? will this code work or am i implementing things out of order or in the wrong fashion. be blunt i know i am not doing this very well thats why i am here i figured i could learn from some pros :) thanx again

Replies are listed 'Best First'.
Re: Redone DB split help needed
by vladb (Vicar) on Jul 16, 2002 at 16:53 UTC
    If this is your first encounter with the Perl scripting language, my suggestion would be for you to take a look at some of the Perl books for beginners. At the very least take a look around the www.perldoc.com site. There you should be able to learn the basics (and later the extra stuff) of the language you are attempting to use. I learnt Perl over one weekend the first time I had to use it for a consulting project.

    Alright, enough of the preaching.. onto the question at hand. It would be much helpful if you provided us with some sample input data. I'm still not sure how it looks to begin with.

    As for the code you've submitted here, it's not quite valid Perl. For example, this
    $num / $tnum = $div;
    will simply not work. What are you trying to do here? If my assumption is correct and you are trying to simply store the result of deviding $num by $tnum in the $div variable, then you'll have put this line as follows:
    $div = $num / $tnum;
    Also, some of your lines are missing the ';'..

    _____________________
    # Under Construction
(jeffa) Re: Redone DB split help needed
by jeffa (Bishop) on Jul 16, 2002 at 19:56 UTC

    This code snippet will hopefully help you. It reads lines and stores them into an array until the token 'TIMESTEP' is found. Afterwards, the array is iterated and each element is written to the appropriate file. Since you did not supply a data file, i will with the help of our friends from 'Mr. Show'. Here they are, doing Shakespeare:

    output.txt:
    I'm the king! 
    I'm mad! 
    I want the news from my kingdom! 
    TIMESTEP
    I'm the queen! 
    Blah blah blah! 
    Look at me!
    TIMESTEP
    Your majesty, I'm a clown or something.. 
    I've got makeup on my face, 'cause my mommy and daddy didn't 
    give me enough attention!
    TIMESTEP
    I'm a big actor! 
    Look at the great big actor on the stage! 
    I yell the loudest!
    Look at me!
    TIMESTEP
    Oh mighty king! 
    We are your sons! 
    We're loud, stupid actors and we're gonna have a big dumb 
    sword fight, in your honor!
    TIMESTEP
    
    And here is the script:
    use strict; my @superchunk; # i think i'm hyper enough as it is ... my $i = 0; open(IN,'output.txt') or die $!; # process one line at a time # add each line to the element that $i is # increment $i when TIMESTEP is encountered while (<IN>) { $superchunk[$i] .= $_; $i++ if /^TIMESTEP/; } close IN; $i = 1; # process one element at a time # use sprintf to create the name of the file # write that element to the file for (@superchunk) { my $file = sprintf("%s%02d%s",'BRO',$i++,'.pdb'); open(OUT,">$file") or die "can't write to $file: $!"; print OUT $_; }
    After you execute this script, you will have 5 files: BRO01.pdb, BRO02.pdb, BRO03.pdb, BRO04.pdb, and BRO05.pdb. Read up on sprintf for more info about that highly useful function.

    Final thought, instead of storing the lines to be written in an array, consider writing them out inside the while loop instead. This works better for very large files, but if you have plenty of RAM then storing them in an array first should be OK.

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    
Re: Redone DB split help needed
by Anonymous Monk on Jul 16, 2002 at 22:02 UTC
    HEADER     2057 CUBOOCTAHEDRAL BINARY AuCu CLUSTER                    
     COMPND         2057
     AUTHOR   GENERATED BY DL_POLY2 v2.13
    TIMESTEP      1000      2057         0         0    0.005000
    ATOM     1  Cu   UNK A   1      -0.111  -0.030   0.175  1.00  0.00      Cu
    ATOM     2  Cu   UNK A   1      -0.095  -2.146   1.879  1.00  0.00      Cu
    ATOM     3  Cu   UNK A   1      -2.010  -0.093   2.109  1.00  0.00      Cu
    ATOM     4  Cu   UNK A   1      -0.095  -1.910  -1.917  1.00  0.00      Cu
    ATOM     5  Cu   UNK A   1       1.852   1.831   0.256  1.00  0.00      Cu
    ATOM     6  Cu   UNK A   1      -1.907   2.038   0.122  1.00  0.00      Cu
    ATOM     7  Cu   UNK A   1       0.016  -0.142   3.791  1.00  0.00      Cu
    
    this is a bit of what i am going to be reading through and this is broken i am just not on my reg comp so i aint loged in