chavanak has asked for the wisdom of the Perl Monks concerning the following question:
This above code gives me a webpage link this one:use LWP::Simple; use HTML::Parser; my $query = shift(@ARGV); #print "$query\n"; die "waste fellow dnt know hw to write a program" unless (open(OUT,">t +est12.txt")); $content = get("http://amigo.geneontology.org/cgi-bin/amigo/search.cgi +?query=$query;search_constraint=gp;action=query;view=query/"); die "Couldnt get the website!" unless defined $content; print OUT "$content"; close (OUT); print "At the whileloop"; open (IN, "/home/vivek/Desktop/test12.txt"); while(my $line = <IN>) { chomp $line; $line =~ s/^[\s\t]+|[\s\t]+$//; #$string = "GO:"; if ($line =~ s/(.+)(term\=GO\:)(\d+)(\"\>.+)/$1$2$3$4/g) { print $1; $page = get("http://amigo.geneontology.org/cgi-bin/amigo/term-details. +cgi?term=GO:$3"); HTML::Parser->new(text_h => [\my @accum, "text"])->parse($page); print map $_->[0], @accum; #print $page; } } close(IN);
The above code is just an example, in the original html file there will be many different class and class name. What I want the program to do is if I pass the parameter "molecular function", then it should give me the class name, i.e., ATP Binding. Can someone help me out please?? Thank you<div class="contents term"> <h1 class="name">ATP binding</h1> <ul id="navPage" class="inline C"> <li><a href="#info" title="View term information">Term informa +tion <img src="http://amigo.geneontology.org/amigo/images/down.png" a +lt="in-page link"></a></li> <li><a href="#lineage" title="View the placement of the term i +n the tree">Term lineage <img src="http://amigo.geneontology.org/amig +o/images/down.png" alt="in-page link"></a></li> <li><a href="#xrefs" title="View cross-references to external +databases">External references <img src="http://amigo.geneontology.or +g/amigo/images/down.png" alt="in-page link"></a></li> <li><a href="term-assoc.cgi?term=GO:0005524&session_id=323 +3amigo1249562763" title="View gene products associated with this term +">3492 gene product associations <img src="http://amigo.geneontology. +org/amigo/images/left.png" alt="link to another page"></a></li> </ul> <div class="block" id="info"> <h2>Term Information</h2> <dl class="term-info"> <dt>Accession</dt> <dd class="acc">GO:0005524</dd> <dt>Ontology</dt> <dd class="type">molecular function</dd> <dt>Synonyms</dt>
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Using XML::Twig| HTML::Parser
by gmontema (Initiate) on Aug 06, 2009 at 17:07 UTC |