in reply to bioinformatics problem

To get you started.

...feeding the data to the online tool...
Have a look at the LWP and WWW-Mechanize

...getting the results...
Have a look at HTML::TokeParser

Update:
Added WWW-Mechanize

Update 2:
This gets the html:

#!/usr/bin/perl use strict; use warnings; use WWW::Mechanize; # field name: inseq my $url = q|http://thr.cit.nih.gov/molbio/hla_bind/index.shtml|; my $seq = 'EALLKQSWEVLKQNIPGHSLCLFALIIEAAPESKYVFSFLKDSNEIPENNPKLKAHAAV +IFKTICESATE LRQKGQAVWDNNTLKRLGSIHLKNKITDPHFEVMKGALLGTIKEAVKENWSDEMCCAWTEAYNQLVATIK AEMKE'; my $mech = WWW::Mechanize->new() or die "couldn't get Mech object: $!"; $mech->get($url) or die "couldn't 'get': $!"; $mech->submit_form( form_number => 1, fields => { inseq => $seq, } ) or die "couldn't submit form"; my $html = $mech->content() or die "content failed: $!"; { my $file = 'bio_output.html'; open my $fh, '>', $file or die "can't open $file: $!"; print $fh $html; close $fh; }
The error checking may be a <cough> tad excessive :-)

Update 3
Here's my go at extracting the data:

#!/usr/bin/perl use strict; use warnings; use HTML::TableExtract; my $html; { local $/; my $file = 'bio_output.html'; open my $fh, '<', $file or die "can't open $file: $!"; $html = <$fh>; close $fh; } my $t = HTML::TableExtract->new(); $t->parse($html); my $report = $t->tables_report(1); print $report;
output:
---------- Capture Output ---------- > "C:\Perl\bin\perl.exe" monk18.pl TABLE(0, 0): User Parameters and Scoring Information: method selected to limit number of results:explicit number number of results requested:20 HLA molecule type selected:A_0201 length selected for subsequences to be scored:9 echoing mode selected for input sequence:Y echoing format:numbered lines length of user's input peptide sequence:145 number of subsequence scores calculated:137 number of top-scoring subsequences reported back in scoring output tab +le:20 TABLE(0, 1): Scoring Results::: Rank:Start Position:Subsequence Residue Listing:Score (Estimate of Hal +f Time of Disassociation of a Molecule Containing This Subsequence) 1: 2:ALLKQSWEV:1930.068 2: 95:KITDPHFEV: 795.962 3: 108:LLGTIKEAV: 57.937 4: 107:ALLGTIKEA: 42.278 5: 21:CLFALIIEA: 42.278 6: 19:SLCLFALII: 16.254 7: 63:TICESATEL: 12.043 8: 12:KQNIPGHSL: 7.581 9: 14:NIPGHSLCL: 2.937 10: 101:FEVMKGALL: 1.911 11: 133:NQLVATIKA: 1.864 12: 35:YVFSFLKDS: 0.970 13: 45:EIPENNPKL: 0.903 14: 75:GQAVWDNNT: 0.756 15: 135:LVATIKAEM: 0.739 16: 103:VMKGALLGT: 0.737 17: 52:KLKAHAAVI: 0.524 18: 123:EMCCAWTEA: 0.457 19: 3:LLKQSWEVL: 0.434 20: 89:SIHLKNKIT: 0.420

Replies are listed 'Best First'.
Re^2: bioinformatics problem
by mutatedgene (Novice) on May 31, 2006 at 08:08 UTC
    hi

    thanks a lot for the help.. but i dont seem to be able to install teh lwp and the mehanize modules.. do i need admin permission to install? and im going through proxy server.. pl help!!

    arvind