Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Reading PDB files

by esswired (Initiate)
on Jan 29, 2003 at 13:37 UTC ( [id://230938]=perlquestion: print w/replies, xml ) Need Help??

esswired has asked for the wisdom of the Perl Monks concerning the following question:

I am a very non-technical user, having never even opened a command window before I downloaded perl in an attempt to read a PDB file. I would love to be able to pull the PDB file into Excel (or Access which I think I could learn to use). Am I at the right place? Is there an easier way I should know about? Respectfully . . .

Replies are listed 'Best First'.
Re: Reading PDB files
by scain (Curate) on Jan 29, 2003 at 14:36 UTC
    Where to start?

    First, I would really suggest the book Learning Perl, authored by our very own merlyn.

    Next, get yourself to CPAN, the Comprehensive Perl Archive Network. There are literally thousands of useful Perl modules for doing all sorts of things, like reading special file formats and working with other programs. Since you are probably using Perl on Windows, and therefore you probably have Activestate Perl, you will want to learn how to use PPM to install those modules on your machine.

    Finally, if by PDB file you are referring to a Protein DataBase Bank file, you will want to check out BioPerl, which has many useful bioinformatics functions, including reading special file formats. You might also want to consider gettting Beginning Perl for Bioinformatics, which is a pretty good reference for a scientist who wants to learn bioinformatics.

    Good luck, and feel free to come back with more specific questons when you've started writing code.

    Scott
    Project coordinator of the Generic Model Organism Database Project

Re: Reading PDB files
by Anonymous Monk on Jan 29, 2003 at 14:42 UTC

    Check out:

    PDB Perl Module

    This provides a mechanism for Perl to read write PDB (Protein Data Bank) files. I assumed this is the file type you mean rather than a palm pilot database. You could use this module to create a comma separated data file to import into Excel. You might also want to check out bioperl

    Arun

      Palm::PDB just in case (this also)


      MJD says you can't just make shit up and expect the computer to know what you mean, retardo!
      ** The Third rule of perl club is a statement of fact: pod is sexy.

      I have a Palm PDB file & want to read the whole thing & want to change it into a CSV file. Any ideas?
Re: Reading PDB files
by gt8073a (Hermit) on Jan 29, 2003 at 18:27 UTC

    I am a very non-technical user

    Bioperl may be way more complicated than you need to get started( don't get me wrong, it is fabulous, just not simple to follow )

    The easiest approach is to read each line of the file, and matching the words/characters up to the first colon. How you store the data really depends on what data you want and what you intend to do with it all( again, this can get extremely tricky ).

    Here is an extremely simple script that reads all sequences for an entry, blast them, and writes the information to a file. If you refer to your Camel book, this script should be easy enough to figure out and extend. Remember, though, Excel can read csv( comma seperated value ) files, and you will need to read up on DBI if you want to throw data into Access.

    #!/usr/bin/perl -w use strict; ## watch typos and errors $|++; ## output my $begin = time; ## just seeing how long we take ## change these my $root = 'C:\Windows\'; ## where the base of our script is my $source = "$root/PDBFind.txt"; ## do NOT use the full PDB +Find.TXT file, I mean it! my $out = "$root/out.txt"; ## file to save blast results + to my $blast_exe = 'C:\Blast\bl2seq.exe'; ## where our blast pr +ogram is my $options = '-G 11 -E 1 -W 3 -X 50 -F F -p blastp'; ## our + blast options my $min_size = 10; ## don't blast shorty sequences, this isn't a + protein anyway ## prolly don't change these my $blast_i = "$root/i.$$"; ## temp files to hold sequences my $blast_j = "$root/j.$$"; ## quick recap so we know what we are doing and where we are print <<"ENDOFPRINT"; SCRIPT : $0 IN : '$source' OUT : '$out' BLAST : '$blast_exe $options -i "$blast_i" -j "$blast_j"' ENDOFPRINT ## lines in PDBFind I care about my $DELIMITER = '//'; my $ID = 'ID : '; my $CHAIN = 'Chain : '; my $SEQUENCE = ' Sequence : '; my $COMPOUND = 'Compound : '; my $SOURCE = 'Source : '; my $HEADER = 'Header : '; ## counters my $pdb_count = 0; my $chain_count = 0; my $blast_count = 0; my $wild_count = 0; my $stub_count = 0; my $dna_count = 0; ## variables relevant to data I am looking for my ( $id, $chain ); my ( @chain_seq_pairs, @chain_wild, @chain_stubby, @chain_dna ); ## open files open( SOURCE, "<$source" ) or die "Can not open $source for reading: $!"; open( OUT, ">$out" ) or die "Can not open $out for writing: $!"; ## parse source and blast chains while ( <SOURCE> ) { chomp; next if ( /^\s*$/ ); ## match lines if ( /^$DELIMITER/ ) { ## a holder my @been_blasted; ## a little sanity next unless $id; ## up counter $pdb_count++; ## part of final counter print OUT "${ID}${id}\n"; ## blast everything foreach my $i ( @chain_seq_pairs ) { foreach my $j ( @been_blasted ) { print OUT "${id} $i->[0] $j->[0]", '-' x 30, "\n"; ## fill our blast i and j temp files open( BLASTI, ">$blast_i" ) or die "Can not open $blast_i for writing: $!"; open( BLASTJ, ">$blast_j" ) or die "Can not open $blast_j for writing: $!"; print BLASTI $i->[1]; print BLASTJ $j->[1]; close( BLASTI ); close( BLASTJ ); ## blast away my @blast_results = split( "\n", `$blast_exe $options +-i "$blast_i" -j "$blast_j"` ); ## increase counter $blast_count++; ## now cycle through lines, ## put filters, what have yous here foreach my $line ( @blast_results ) { print OUT "${id} $i->[0] $j->[0]| $line\n"; } } ## save current sequence to be blasted against push( @been_blasted, $i ); } print OUT "\n"; ## increase counters $chain_count += scalar( @chain_seq_pairs ); $wild_count += scalar( @chain_wild ); $stub_count += scalar( @chain_stubby ); $dna_count += scalar( @chain_dna ); ## clean up a few things undef @chain_seq_pairs; undef @chain_dna; undef @chain_stubby; undef @chain_wild; undef $id; undef $chain; next; } if ( /^$ID/ ) { $id = $'; next; } ## These are noops, but this is how you can catch their values ## if ( /^$COMPOUND/ ) {} ## if ( /^$HEADER/ ) {} ## if ( /^$SOURCE/ ) {} if ( /^$CHAIN/ ) { $chain = $'; next; } if ( /^$SEQUENCE/ ) { my $tmp = $'; my $bad = 0; if ( $tmp =~ /^x+$/i ) { ## we're a straight wild card chain ## ignore because that means we can be anything ## just a bunch of aa's stuck together but ## we don't know / care about specifics push( @chain_wild, [ $chain, $tmp ] ); $bad++; } if ( $tmp ne uc( $tmp ) ) { ## we're dna, crude test, but it works on the data ## DNA is lower case push( @chain_dna, [ $chain, $tmp ] ); $bad++; } if ( length( $tmp ) <= $min_size ) { ## we're too short push( @chain_stubby, [ $chain, $tmp ] ); $bad++; } next if $bad; push( @chain_seq_pairs, [ $chain, $tmp ] ); next; } } ## clean up close( OUT ); close( SOURCE ); unlink( $blast_i, $blast_j ); ## no error checking?! my $end = time - $begin; print <<"ENDOFPRINT"; PDBs found : $pdb_count Chains found : $chain_count Blasts performed : $blast_count Wild sequences : $wild_count Stubby sequences : $stub_count DNA sequences : $dna_count TIME : $end seconds ENDOFPRINT print "\nDone\n"; exit(0);

    Again, this script should help you get started in parsing a PDB file. But, you will have to read up on DBI if you want to stuff the data in Access( there is plenty of material already on the site about DBI, btw ).

    If you need some more help, feel free to send me an email.

    Will perl for money
    JJ Knitis
    (404) 624-1525
    gt8073a@industrialmusic.com

      I cannot find a reference to "camel book" Can you help? Thanks

        The "camel book" is O'Reilly's "Programming Perl" ISBN 0-596-00027-8. No perl programmer should be without one.

        --- print map { my ($m)=1<<hex($_)&11?' ':''; $m.=substr('AHJPacehklnorstu',hex($_),1) } split //,'2fde0abe76c36c914586c';
Re: Reading PDB files
by Jaap (Curate) on Jan 29, 2003 at 13:41 UTC
    Welcome to PerlMonks, esswired.
    Have you succeeded in getting ANY perl script to work?
    Have you tried running through the file using perl?
Re: Reading PDB files
by esswired (Initiate) on Jan 31, 2003 at 02:35 UTC
    OK. I guess I am in over my head. I have a Palm PDB file & want to read the whole thing & want to change it into a CSV file or something readable for Excel. The only script which I have run successfully is the example (a .pl file) one sent with perl. It does not require any input. I would want to pass the name of the file I want to read. Does the code you sent have to be compiled? The PALM:PDB file I copied from CPAN seems to be a .pm file. Is the Camel book something I have? I found lots of references, but no book. Should I give up now?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://230938]
Approved by scain
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2024-03-29 10:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found