InfoSeeker has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I am confused about how to execute 'mass' commands when calling perl code in a shell script. I have around 30,000 DSSP files which I want to process within Perl. This entails creating a Bioperl DSSP object for every DSSP file that I will read. Part of the code (DSSP_output.pl) is as follows:

#!/usr/bin/perl -w use strict; use warnings; use Bio::Structure::SecStr::DSSP::Res; open (DSSPIN, "$ARGV[0]") || die $!; #Create a new DSSP object my $dssp_obj = new Bio::Structure::SecStr::DSSP::Res('-file'=>DSSPIN); # EXAMPLE OF HOW THE OBJECT IS NORMALLY DECLARED #my $dssp_obj = new Bio::Structure::SecStr::DSSP::Res('-file'=>'3bit.d +ssp'); #Get PDB ID and Compound representation for each DSSP file my $pdb_id = $dssp_obj->pdbID(); print "Analysis of PDB:: ". $pdb_id. "\n"; my $cmpd = $dssp_obj->pdbCompound(); print "Representing:: ". $cmpd. "\n"; etc...

My dilemma is as follows: In a shell script, I want to read a list of filenames and execute DSSP_output.pl for each file. I am using in shell this would translate to:

<some loop to read each dssp filename> dssp_output.pl filename.dssp <end loop>

Now how do I read each filename in perl? I can't use STDIN (I think) because I want this running automatically... the way I have written the code, DSSPIN produces the error:

Bareword "DSSPIN" not allowed when "strict subs" in use.

If I try the following:

open (DSSPIN, <>) || die $!; my $dssp_obj = new Bio::Structure::SecStr::DSSP::Res('-file'=>DSSPIN);

Same error. And finally:

@data_from_files = <>; #Create a new DSSP object my $dssp_obj = new Bio::Structure::SecStr::DSSP::Res('-file'=>$data_fr +om_files); Error: Global system @data_from_files, $data_from_files requires expli +cit package name

Apologies for a long post...I'm just badly confused with filehandling! What should I write to read in the DSSP filenames in the perl script? Much obliged!

A very confused InfoSeeker

Replies are listed 'Best First'.
Re: Confused when reading Input within Perl and Shell Scripts
by GrandFather (Saint) on Nov 16, 2008 at 22:50 UTC

    Where will the file names come from? Will you provide a file containing the names? Do you want to process all the files in a given directory? Do you want to hard wire the list of names in the Perl script? For the moment lets assume you have resolved that issue and have an array @filenames that contains the names of the files to process. You could then:

    for my $filename (@filenames) { open my $dsspIn, '<', $filename or die "Unable to open $filename: +$!\n"; #Create a new DSSP object my $dssp_obj = Bio::Structure::SecStr::DSSP::Res->new ('-file'=> $ +dsspIn); ... close $dsspIn; }

    Perl reduces RSI - it saves typing

      Thank you. Initially I want to provide a txt file with the filenames, for example, DsspCodes.txt:

      2bx2.dssp 3cob.dssp 1nwn.dssp

      Based on your comments and my novice use of perl, I have modified the code to:

      use strict; use warnings; use Bio::Structure::SecStr::DSSP::Res; my @filenames; for my $filename (@filenames) { open my $dsspIn, '<', $filename or die "Unable to open $filename: $!\n +"; #Create a new DSSP object my $dssp_obj = Bio::Structure::SecStr::DSSP::Res->new ('-file'=> $dssp +In); .... close dsspIn; }

      But if call in the commandline:

       DSSP_output.pl DsspCodes.txt

      I don't get any errors but I also don't get any output??

      Thanks for your patience as I learn this..

      InfoSeeker

        Think about how the contents of the external file containing the file names are going to get into the @filenames array. At present your code has nothing to do that. One way would be:

        my $namesFileName = shift; # Get name of the file names file from comm +and line open my $inNames, '<', $namesFileName or die "Unable to open $namesFil +eName: $!\n"; my @filenames = <$inNames>; # Read the file names (one per line) close $inNames; chomp @filenames; # Remove the line end character from each entry ...

        or you could replace the for loop with a while loop and eliminate the array:

        use strict; use warnings; my $namesFileName = shift; # Get name of the file names file from co +mmand line open my $inNames, '<', $namesFileName or die "Unable to open $namesFil +eName: $!\n"; while (<$inNames>) { chomp; # Remove line end character from file name open my $dsspIn, '<', $_ or die "Unable to open $_: $!\n"; ... } close $inNames;

        Perl reduces RSI - it saves typing
Re: Confused when reading Input within Perl and Shell Scripts
by JavaFan (Canon) on Nov 16, 2008 at 22:48 UTC
    Now how do I read each filename in perl?
    Well, that depends on how your program is supposed to get the filenames. Is it all files from a certain directory? Use opendir and readdir. Are the filenames given through a pipe? Use <>. Are the filenames the command line arguments? Then they are found in @ARGV. Do you use command line options to communicate filenames? Use Getopt::Long (or some other Getopt:: module).
    The way I have written the code, DSSPIN produces the error:
    My advice is to stop using bareword file handles. That's so last millennium. Use autovivifying filehandles:
    my $filename = "...."; open my $fh, "<", $filename or die; while (<$fh>) { ... }
    Autovivifying filehandles have been there since March 2000 - when 5.6.0 got released.
Re: Confused when reading Input within Perl and Shell Scripts
by CountZero (Bishop) on Nov 16, 2008 at 23:28 UTC
    I do not know the API for this module but
    # EXAMPLE OF HOW THE OBJECT IS NORMALLY DECLARED #my $dssp_obj = new Bio::Structure::SecStr::DSSP::Res('-file'=>'3bit.d +ssp');
    seems to indicate it expects a filename and not a filehandle.

    So, try this:

    my $dssp_obj = new Bio::Structure::SecStr::DSSP::Res('-file'=>$ARGV[0] +);
    As was already explained by other Monks, rather than (re)start your script for each file, let Perl find the files you need and do it all in one session.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      Thank you all very much...I think I have managed to sort it out finally!

      InfoSeeker (no longer as confused..very grateful and happy!)

Re: Confused when reading Input within Perl and Shell Scripts
by gwadej (Chaplain) on Nov 16, 2008 at 23:23 UTC

    If I'm reading your example code correctly, you are closer than you think. When you say:

    # EXAMPLE OF HOW THE OBJECT IS NORMALLY DECLARED #my $dssp_obj = new Bio::Structure::SecStr::DSSP::Res('-file'=>'3bit.d +ssp');

    it looks like you are saying that normally the name of the of the file (and not a file handle) needs to be passed to Bio::Structure::SecStr::DSSP::Res::new method as the -file argument.

    If that's the case, then you could do the following:

    my $dssp_obj = new Bio::Structure::SecStr::DSSP::Res('-file'=>$ARGV[0] +);

    if the file you want to open is on the command line of your script, like your example. Now, if you wanted to replicate this for all of the files on the command line, you would do something like:

    my @dssp_objs = (); foreach my $file (@ARGV) { push @dssp_objs, Bio::Structure::SecStr::DSSP::Res->new('-file'= +>$file); }

    giving you an array of DSSP objects to work with.

    G. Wade

      Ok, bear with me. I have used the suggestion:

      my $dssp_obj = new Bio::Structure::SecStr::DSSP::Res('-file'=>$ARGV[0] +);

      Which works (and I'm ignoring several filehandle warnings). Now, I have written a shell script:

      #!/bin/bash while read DSSPLine ; do echo $DSSPLine DSSP_Output.pl $DSSPLine.dssp done

      QUESTION: In each interation, the output of DSSP_Output.pl is written to a CSV file. Is there a way such that (in perl) this output is written for all 30,000 iterations to one file only?

      The very obvious answer is to use a DSSP array like you mentioned, I have tried that (in a way):

      use strict; # 'use strict' requires that you use 'my' for all local va +riables, or explicitely qualify all globals. use warnings; use Bio::Structure::SecStr::DSSP::Res; my @dssp_objs =(); foreach my $file (@ARGV) { push @dssp_objs, Bio::Structure::SecStr::DSSP::Res->new('-file'=>$file +); } foreach my $dssp_obj(@dssp_objs) { #Get PDB ID and Compound representation for each file my $pdb_id = $dssp_obj->pdbID(); print "Analysis of PDB:: ". $pdb_id. "\n"; my $cmpd = $dssp_obj->pdbCompound(); print "Representing:: ". $cmpd. "\n"; etc... }

      But when I do this, and run the commandline with a txt file of the DSSP filenames:

       DSSP_Output.pl DSSP_codes.txt

      I get an exception!

      Please let me know what you think...apologies for all the headache...

      A <slowly deconfusing> InfoSeeker

        The approach I gave would have worked if you had all of the list of filenames on the command line. If you want to take the list of filenames from a file, then we need to modify the input loop somewhat:

        my @dssp_objs =(); #reads the files from the command line one line at a time. while(<>) { chomp; # remove the newline. push @dssp_objs, Bio::Structure::SecStr::DSSP::Res->new('-file'=>$_); }

        This code would read a line at a time out of DSSP_codes.txt (using your example).

        I changed to the while(<>) loop just to be consistent with the way most people do this kind of loop. Since it automatically loads the $_ variable (and the loop is pretty short), I removed the $file variable. Otherwise, this is a drop-in replacement for the previous loop that should meet your needs.

        G. Wade

      Ok, bear with me. I have used the suggestion:

      my $dssp_obj = new Bio::Structure::SecStr::DSSP::Res('-file'=>$ARGV[0] +);

      Which works (and I'm ignoring several filehandle warnings). Now, I have written a shell script:

      #!/bin/bash while read DSSPLine ; do echo $DSSPLine DSSP_Output.pl $DSSPLine.dssp done

      QUESTION: In each interation, the output of DSSP_Output.pl is written to a CSV file. Is there a way such that (in perl) this output is written for all 30,000 iterations to one file only?

      The very obvious answer is to use a DSSP array like you mentioned, I have tried that (in a way):

      use strict; # 'use strict' requires that you use 'my' for all local va +riables, or explicitely qualify all globals. use warnings; use Bio::Structure::SecStr::DSSP::Res; my @dssp_objs =(); foreach my $file (@ARGV) { push @dssp_objs, Bio::Structure::SecStr::DSSP::Res->new('-file'=>$file +); } foreach my $dssp_obj(@dssp_objs) { #Get PDB ID and Compound representation for each file my $pdb_id = $dssp_obj->pdbID(); print "Analysis of PDB:: ". $pdb_id. "\n"; my $cmpd = $dssp_obj->pdbCompound(); print "Representing:: ". $cmpd. "\n"; etc... }

      But when I do this, and run the commandline with a txt file of the DSSP filenames:

       DSSP_Output.pl DSSP_codes.txt

      I get an exception!

      Please let me know what you think...apologies for all the headache...

      A <slowly deconfusing> InfoSeeker