rmgzsm9 has asked for the wisdom of the Perl Monks concerning the following question:
I have a list ID codes (UniProt Codes). Each code representative of a protein. There is website called InterPro that is used to deal with proteins related information. URL for that website contains that particular code. By changing that code in that URL I can get information about any protein. I wrote a program that can retrieve HTML page from the url and the retrieve information written under the heading of "protein family membership". I did this usinf HTML::treebuilder. Code is written below:
use LWP::Simple; use HTML::TreeBuilder; my @ports=qw( P23141 P61177 P60725 P30542 P21817 P29274 Q07343 P08172 P20309 Q9GZZ6 ); for (my $i=0;$i < scalar(@ports);$i++) { my $url= "http://wwwdev.ebi.ac.uk/interpro/ISearch?query=".$ports[$i]. +"+"; my $resp = get( $url ); my $tree = HTML::TreeBuilder->new_from_content($resp); my $first=$tree->look_down(_tag => 'div',class => 'prot_fam'); $first=$first->look_down(_tag => 'div',class => 'entry-parent'); $first=$first->look_down(_tag => 'div',class => 'entry-parent'); $first=$first->look_down(_tag => 'a'); open (FH,">>result.txt"); print FH $ports[$i].";"; print FH $first->content_list; print FH "\n"; close(FH); }
Now the problem is this code goes well if there is a family name after prot_fam then parent entry(2 times) in the HTML source page. However, when a family is not defined the structure of the source of html webpage is actually different; after the line with 'prot_fam' there is written "No family membership assigned", but there is no 'entry parent' in the following lines. And when this perl script finds an entry in a list of codes where there is "No family membership assigned" written on webpage, it stops working forward. What I want is that this perl script should move forward and it should skip the entries with "No family membership assigned" information. I am new to perl. Please help me solve this problem. I will be grateful.
|
|---|