Hi Monks, Can you please help me guys. I desperately need help here :) From the output of my old thread here Help regarding regular expression I have tried to use HTML::Parser to parse my gene ontology data. Here is the code
use LWP::Simple; use HTML::Parser; my $query = shift(@ARGV); #print "$query\n"; die "waste fellow dnt know hw to write a program" unless (open(OUT,">t +est12.txt")); $content = get("http://amigo.geneontology.org/cgi-bin/amigo/search.cgi +?query=$query;search_constraint=gp;action=query;view=query/"); die "Couldnt get the website!" unless defined $content; print OUT "$content"; close (OUT); print "At the whileloop"; open (IN, "/home/vivek/Desktop/test12.txt"); while(my $line = <IN>) { chomp $line; $line =~ s/^[\s\t]+|[\s\t]+$//; #$string = "GO:"; if ($line =~ s/(.+)(term\=GO\:)(\d+)(\"\>.+)/$1$2$3$4/g) { print $1; $page = get("http://amigo.geneontology.org/cgi-bin/amigo/term-details. +cgi?term=GO:$3"); HTML::Parser->new(text_h => [\my @accum, "text"])->parse($page); print map $_->[0], @accum; #print $page; } } close(IN);
This above code gives me a webpage link this one:
<div class="contents term"> <h1 class="name">ATP binding</h1> <ul id="navPage" class="inline C"> <li><a href="#info" title="View term information">Term informa +tion <img src="http://amigo.geneontology.org/amigo/images/down.png" a +lt="in-page link"></a></li> <li><a href="#lineage" title="View the placement of the term i +n the tree">Term lineage <img src="http://amigo.geneontology.org/amig +o/images/down.png" alt="in-page link"></a></li> <li><a href="#xrefs" title="View cross-references to external +databases">External references <img src="http://amigo.geneontology.or +g/amigo/images/down.png" alt="in-page link"></a></li> <li><a href="term-assoc.cgi?term=GO:0005524&amp;session_id=323 +3amigo1249562763" title="View gene products associated with this term +">3492 gene product associations <img src="http://amigo.geneontology. +org/amigo/images/left.png" alt="link to another page"></a></li> </ul> <div class="block" id="info"> <h2>Term Information</h2> <dl class="term-info"> <dt>Accession</dt> <dd class="acc">GO:0005524</dd> <dt>Ontology</dt> <dd class="type">molecular function</dd> <dt>Synonyms</dt>
The above code is just an example, in the original html file there will be many different class and class name. What I want the program to do is if I pass the parameter "molecular function", then it should give me the class name, i.e., ATP Binding. Can someone help me out please?? Thank you

In reply to Using XML::Twig| HTML::Parser by chavanak

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.