I have a list ID codes (UniProt Codes). Each code representative of a protein. There is website called InterPro that is used to deal with proteins related information. URL for that website contains that particular code. By changing that code in that URL I can get information about any protein. I wrote a program that can retrieve HTML page from the url and the retrieve information written under the heading of "protein family membership". I did this usinf HTML::treebuilder. Code is written below:

use LWP::Simple; use HTML::TreeBuilder; my @ports=qw( P23141 P61177 P60725 P30542 P21817 P29274 Q07343 P08172 P20309 Q9GZZ6 ); for (my $i=0;$i < scalar(@ports);$i++) { my $url= "http://wwwdev.ebi.ac.uk/interpro/ISearch?query=".$ports[$i]. +"+"; my $resp = get( $url ); my $tree = HTML::TreeBuilder->new_from_content($resp); my $first=$tree->look_down(_tag => 'div',class => 'prot_fam'); $first=$first->look_down(_tag => 'div',class => 'entry-parent'); $first=$first->look_down(_tag => 'div',class => 'entry-parent'); $first=$first->look_down(_tag => 'a'); open (FH,">>result.txt"); print FH $ports[$i].";"; print FH $first->content_list; print FH "\n"; close(FH); }

Now the problem is this code goes well if there is a family name after prot_fam then parent entry(2 times) in the HTML source page. However, when a family is not defined the structure of the source of html webpage is actually different; after the line with 'prot_fam' there is written "No family membership assigned", but there is no 'entry parent' in the following lines. And when this perl script finds an entry in a list of codes where there is "No family membership assigned" written on webpage, it stops working forward. What I want is that this perl script should move forward and it should skip the entries with "No family membership assigned" information. I am new to perl. Please help me solve this problem. I will be grateful.


In reply to Reading particular information from Html page and skipping the page that doesn't contain that information by rmgzsm9

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.